The Urban Airship engineers have been hard at work on a system for delivering push notifications to devices based on their location. The storage layer of such a system capable of supporting hundreds of millions of devices proved an interesting, difficult problem.
Several architectural challenges and moving pieces emerged immediately. The new system had to be integrated with the existing tag-based device segmentation feature. Our solution had to provide the ability to filter devices by a combination of tags and location—only being able to choose one or the other would make for an awkward, disappointing user experience. We decided to improve the tag storage system as part of the project—the existing solution was starting to falter; conjunctive (AND) queries were not supported, and were quickly determined to be unfeasible without a redesign. We arrived at good storage solutions that supported tag queries and spatial queries separately, but not a combination of the two; none of the storage options available lent itself to complex queries over both types of data.
After evaluating several approaches of varying complexity, we noticed some convenient properties shared by our diverse storage backends. This discovery enabled us to develop a standalone service that implemented a JOIN algorithm between arbitrary datastores. We will share our experience in designing, developing, deploying, optimizing and operating this system, while dipping our toes in database research; pleasant and unpleasant surprises abound.