System Design Interview: Proximity Service (Like Yelp) Experience
Interview Experience
Functional Requirements * Search: Users can search for nearby places (restaurants, gyms, events) or people based on current location. * Filtering: Support for filters including distance, c
Full Details
Functional Requirements * Search: Users can search for nearby places (restaurants, gyms, events) or people based on current location. * Filtering: Support for filters including distance, category, ratings, open status, and personalized recommendations. * Place Details: Users can view business information, photos, reviews, and ratings. * Management: Business owners can add, edit, or claim listings; users can add reviews. * Bookmarks: Users can save favorite locations. Non-functional Requirements * Latency: Millisecond-level search responses. * Scalability: Support for millions of users and tens of millions of locations. * Availability: High availability (99.99% uptime). * Consistency: Eventual consistency is acceptable for new locations, updates, and reviews. Capacity Estimation * Traffic: 50M MAU, 10M DAU. Average 5 searches/user/day. Peak QPS ~100,000. * Volume: 5M reviews/day, 100k new places/day. * Storage: * 100M listings (~200 GB total). * 1B historical reviews (~1 TB total). * Photos (5 per business) (~100 TB total). * 100M favorites (~5 GB total). API Design * GET /search?lat&lon&radius&category&open_now: Returns sorted list of nearby locations. * GET /place/:id: Retrieves specific place details. * POST /review/:placeId/add: Submits a new review. * POST /place/add: Business owners list a new location (name, coordinates, category, photos). * PUT /place/:id/edit: Updates listing details. * POST /place/:id/claim: Verifies ownership of an existing listing. High-Level Architecture * Client: Mobile/Web interfaces for searching and management. * API Gateway: Handles auth, routing, throttling, and monitoring. * Cache: In-memory storage for hot searches and trending zones. * Load Balancer: Distributes traffic to app servers. * App Servers: Business logic for listings, reviews, and search coordination. * Message Queue (Kafka/Redis): Asynchronous processing for search index updates. * Database: Persistent storage for user profiles and business data. * Media Storage: Blob storage/CDN for images. * Segments Producer: Ingests third-party map data (e.g., Google Maps) and divides the world into segments to narrow search scope. * QuadTree Servers: Stores spatial indexes of places; finds locations within a radius. * Aggregators: Collects results from QuadTree servers, ranks them, and returns the final list to the user. Workflows Flow-1: Business Listing 1. Owner submits details via /place/add. 2. Listing Service validates data and stores geo-indexed location. 3. Media Service links uploaded photos to the place. 4. Admin Service reviews/approves the listing. 5. Listing becomes searchable upon approval. Flow-2: Proximity Search 1. User requests /search. 2. Gateway routes to Geospatial Search Service. 3. Search Service queries QuadTree servers for places within the radius. 4. QuadTree servers return candidates to Aggregators. 5. Aggregators filter, rank (proximity + popularity), and return results with metadata. Flow-3: Review Submission 1. User posts to /review/:placeId/add. 2. Review Service validates, stores review, and updates average rating. 3. Review is displayed after moderation (if flagged). Flow-4: Media Handling 1. Uploads are stored in object storage via Media Service. 2. Read requests fetch media URLs served via CDN. Component Deep Dive: Searching Strategy SQL Solution * Stores latitude/longitude in indexed columns. * Uses range queries to find points within a coordinate box. * Limitation: Inefficient at scale (e.g., 500M places). Independent indexes on lat/lon result in large candidate sets requiring costly intersection operations. Static Grids * Map is divided into fixed-size grids; each place has a GridID. * Queries search the target grid and its 8 neighbors. * Limitation: Static grid sizes handle uneven data distribution poorly. Dense areas (e.g., NYC) overload a single grid, while sparse areas (e.g., Pacific Ocean) waste resources. Dynamic Grids (QuadTree) * A tree structure where each node represents a spatial region. * Nodes subdivide into four quadrants when the number of places exceeds a threshold (e.g., 500). * Mechanism: * Leaf Nodes: Hold specific lists of places. * Root: Covers the whole world. * Traversal: Search navigates down the tree to the leaf node covering the user's location. * Neighbors: Leaf nodes are connected via a doubly linked list, allowing expansion of the search radius to neighboring grids without re-traversing the root. * Memory Efficiency: * 500M places * 12 bytes (ID + coords) ≈ 12 GB. * 1M leaf nodes (500 places per leaf) + internal nodes ≈ negligible overhead. * The entire index fits in the RAM of a modern server (~12 GB). Ranking Search Results * Popularity Score: Aggregated from ratings (stored in DB and cached in QuadTree nodes). * Execution: QuadTree leaf nodes return the top 100 places sorted by popularity within the spatial radius. * Aggregation: The central aggregator merges lists from relevant leaf nodes to form the final result. * Updates: Popularity scores in the QuadTree are updated via batch processing (off-peak) to prevent locking and performance degradation during high-traffic reads. Replication and Fault Tolerance * Master-Slave Architecture: Writes go to the Master; Reads are distributed to Slaves. Eventual consistency applies. * Failover: If the Master fails, a Slave is promoted. * Rebuilding (Reverse Index): * If a QuadTree server fails entirely, rebuilding by scanning the whole DB is too slow. * Solution: A separate "QuadTree Index Server" maintains a reverse index (ServerID -> Set of PlaceIDs). * When a server is replaced, it queries the Index Server to know exactly which places to load from the DB. Data Partitioning * Region-Based Sharding: * Places grouped by zip code/geography. * Pros: Preserves spatial locality. * Cons: Hotspots (e.g., downtown areas) cause uneven server load. * LocationID Sharding (Hash-Based): * Places distributed uniformly based on hash of LocationID. * Pros: Even load distribution. * Cons: Every search query must be broadcast to all shard servers to find nearby places, increasing network traffic and latency. * Conclusion: Region-based sharding is generally preferred, using consistent hashing or re-partitioning to handle hotspots.