A ride-sharing service feels like a map problem, but the map is the easy part. The hard part is throughput and contention: millions of drivers each broadcasting their position every few seconds, and a rider who needs the single nearest available one of them found, ranked, and locked - before another rider locks the same driver. That is a geospatial indexing problem wrapped around an atomic-claim problem, and it is why "design Uber" is a staple senior interview question.
This walkthrough assumes the 6-step system design framework and applies it at senior depth. It is Part 7 of a system design series.
Table of Contents
- The Problem
- Step 1 - Clarify Requirements
- Step 2 - Estimate Scale
- Step 3 - API and Data Model
- Step 4 - High-Level Design
- Step 5 - Deep Dive: Geospatial Indexing and Driver Matching
- Step 6 - Bottlenecks and Trade-offs
- Reference Architecture
- Common Mistakes in the Interview
- Quick Reference
- Related Articles
The Problem
We are designing a ride-sharing service: drivers continuously report their location, a rider requests a ride from a pickup point, and the system matches them to a nearby available driver and tracks the trip to completion. The canonical examples are Uber and Lyft.
The senior framing has two halves. The bulk of the load is a write firehose of location updates feeding a geospatial index that must answer "who is near here" in milliseconds. The rare but critical event is the match, which must atomically reserve one driver for one rider. Almost every design decision is either about absorbing the firehose or about making the claim correct.
Step 1 - Clarify Requirements
Functional requirements:
- Drivers report their location continuously while online.
- A rider requests a ride from a pickup location.
- The system finds nearby available drivers and matches one.
- Both parties track each other in real time during the trip.
- A trip moves through a lifecycle: requested, matched, en route, in progress, completed.
Out of scope (name, then defer): surge pricing, payments (that is Part 11), map routing and ETA computation (assume a routing service exists), and driver onboarding.
Non-functional requirements:
- Enormous location write throughput - every online driver updates every few seconds.
- Fast matching - a rider should get a driver within a few seconds.
- Fast nearby queries - sub-100 ms.
- Locations may be eventually consistent - a position a few seconds stale is fine. The match may not - a driver must never be assigned to two riders.
The clarifying question that frames the deep dive: locations and the claim sit on opposite ends of the consistency spectrum. Stale locations are acceptable; a double-booked driver is not. And the matching objective - nearest driver, best ETA, or a batched global optimum - is worth confirming; we design nearest-available and note batch matching.
Step 2 - Estimate Scale
The arithmetic shows this is a location-ingestion system with a comparatively rare matching event bolted on.
Location writes. Assume 5 million drivers online at peak, each reporting every ~4 seconds: 5M / 4 ≈ ~1.25 million location updates/sec. This is the dominant load by far.
Ride requests. At 15 million rides/day, that is ~175 requests/sec average, peaking near ~1,000/sec - three orders of magnitude below the location writes.
Storage. Locations are ephemeral - only the current position of each driver matters. 5M drivers x ~100 bytes ≈ 500 MB, comfortably in memory. A missed update self-corrects within 4 seconds, so this data needs no durability. Trip records, by contrast, are persisted: 15M/day x ~1 KB ≈ ~15 GB/day.
The shape of the problem: a small but ferociously write-hot in-memory index, and a durable but low-volume trip store.
Step 3 - API and Data Model
Driver location updates are too frequent for request-response REST, so drivers stream them over a persistent connection - the connection-layer pattern from Part 6. Riders use a normal request to start a ride; live trip tracking rides the same persistent connection.
(driver, over persistent connection) LOCATION { driverId, lat, lng, ts }
POST /api/rides { riderId, pickup, destination }
200 OK { tripId, driver: { ... }, etaSeconds }| Entity | Key fields |
|---|---|
| Driver location | driverId, lat, lng, geohash, status, updatedAt - in-memory |
| Geospatial index | spatial cell -> set of driver IDs in that cell |
| Trip | tripId, riderId, driverId, status, pickup, destination, timestamps - durable |
The driver status (available / reserved / on-trip) is small but important: it is the field the atomic claim flips.
Step 4 - High-Level Design
flowchart TD
D([Drivers]) -->|location stream| Ingest[Location Ingestion]
Ingest --> Geo[(Geospatial Index<br/>in-memory, sharded by region)]
R([Rider]) -->|request ride| Match[Matching Service]
Match -->|nearby query| Geo
Match -->|atomic claim| Geo
Match --> Trip[Trip Service]
Trip --> TStore[(Trip Store - durable)]
Trip <-->|live tracking| Conn[Connection Layer]
D <--> Conn
R <--> ConnFigure 1. The architecture splits a high-volume location-ingestion firehose (left) from a much lower-volume matching path (right). They share a single in-memory geospatial index sharded by geography. The connection layer - the same primitive used in Part 6 - carries live tracking between driver and rider during the trip.
The location ingestion service absorbs the firehose and keeps the geospatial index current - in memory, sharded by geographic region. The matching service queries that index, claims a driver, and hands off to the trip service, which owns the durable trip lifecycle. The connection layer carries live position updates during a trip.
Step 5 - Deep Dive: Geospatial Indexing and Driver Matching
This is the core. It splits into the spatial index, the write firehose that feeds it, and the matching claim.
Part A - Why a spatial index
The query is "find available drivers within 2 km of this point". Done naively - scan all 5 million drivers and compute each distance - it is O(N) per request and hopeless. A one-dimensional B-tree on latitude does not rescue it either: you would range-scan one axis and still distance-check a huge band. The problem is inherently two-dimensional and needs a spatial index that groups nearby points together.
Part B - Geohash
A geohash encodes a (lat, lng) pair into a short string by recursively bisecting the latitude/longitude space - each added character refines the cell. Roughly, a 5-character geohash is a ~5 km cell, a 6-character one ~1 km.
Its defining property: points whose geohashes share a prefix are spatially close. So the index is just a map from geohash cell to the set of driver IDs in it. A driver location update removes the driver from its old cell and adds it to the new one; a nearby query computes the rider's geohash and reads that cell.
flowchart TD
subgraph Grid["Geohash cells around the rider"]
NW[NW] --- N[N] --- NE[NE]
W[W] --- C["Rider cell"] --- E[E]
SW[SW] --- S[S] --- SE[SE]
end
Q[Nearby query] --> C
Q -->|must also scan| N
Q -->|all 8 neighbours| NEFigure 2. The geohash edge problem made visible. A rider standing near a cell boundary may have their true nearest driver in an adjacent cell, so a nearby query must read the rider's cell plus all eight neighbours and filter the union by exact distance. This is the structural reason geohash queries are 9-cell scans, not 1 - and the senior detail most candidates miss.
The catch is the edge problem: a rider standing near a cell boundary may have their nearest driver in an adjacent cell. So a nearby query must always read the rider's cell plus its eight neighbours, then filter the union by true distance. Cell precision is a trade-off - coarse cells mean fewer cells but more drivers to distance-check; fine cells mean the reverse - and it is best chosen adaptively from local driver density.
Part C - Geohash versus quadtree
A quadtree recursively splits space into four quadrants, subdividing a node only when it exceeds a capacity threshold. Downtown becomes deeply subdivided, farmland stays shallow - the structure adapts to density.
| Aspect | Geohash | Quadtree |
|---|---|---|
| Cell size | Fixed precision | Adapts to density |
| Storage | Plain key-value, prefix queries | Tree structure |
| Updates | Cheap - change a key | Node rebalancing as points move |
| Best for | Constantly moving objects | Relatively static density |
For drivers - millions of objects moving every few seconds - geohash usually wins, because an update is a trivial key change with no rebalancing. Quadtrees shine when the points are stable. Production systems often use refined variants of the same idea: Google's S2 cells, or Uber's H3 hexagonal grid, which avoids the unequal-neighbour quirks of square cells.
Part D - The location write firehose
The index absorbs ~1.25 million writes/sec. Three decisions keep that affordable:
- In memory, not durable. Current location is ephemeral; a lost update is corrected within 4 seconds. Persisting the firehose would be expensive and pointless.
- Sharded by geography. Each shard owns a geographic area (a geohash prefix) and handles its writes and queries; a driver crossing a boundary migrates between shards. Geography is a natural, even-ish partition key.
- Eventually consistent. A driver's indexed position lagging reality by a few seconds does not harm matching.
Part E - Matching and the atomic claim
On a ride request, the matching service queries the index for available drivers near the pickup, ranks the candidates (by distance, or by ETA via the routing service), and selects one.
The subtlety is the selection. Two riders requesting at the same moment can both see the same nearest driver as their top candidate. The match must therefore atomically claim the driver - a compare-and-set that flips status from available to reserved. Exactly one request wins; the loser falls through to its next candidate. This is the one strongly-consistent operation in an otherwise eventually-consistent system - the same atomic-claim idea behind Part 2's counters.
sequenceDiagram
participant R as Rider
participant M as Matching Service
participant G as Geospatial Index
R->>M: request ride at pickup
M->>G: nearby query (cell + 8 neighbours)
G-->>M: candidate drivers, ranked
M->>G: atomic claim driver #1 (available -> reserved)
G-->>M: claim failed (already reserved)
M->>G: atomic claim driver #2
G-->>M: claim succeeded
M-->>R: matched (driver #2, ETA)
Note over M: driver declines or times out -> release, try nextFigure 3. The matching sequence with an atomic claim. Two riders requesting at the same instant might both pick the same nearest driver; the compare-and-set that flips status from available to reserved guarantees only one wins. The loser falls through to its next candidate - this single strongly-consistent operation in an otherwise eventually-consistent system is what prevents real-world double-booking.
If the claimed driver declines or does not respond within a timeout, the claim is released and the next candidate is tried; the claim itself carries a TTL so a crashed driver app cannot strand a reservation. A more advanced design collects requests over a short window and solves them as a single assignment problem - batch matching - trading a second of latency for a globally better set of pairings.
Consistency model
The split is explicit. Driver locations are eventually consistent, seconds-stale, and that is acceptable. The driver claim is strongly consistent via atomic compare-and-set, because a double-booking is a real-world failure. The trip record is durable and authoritative once a match is made.
Failure modes
- Index node loss. A shard's in-memory data vanishes - but it rebuilds within ~4 seconds as drivers re-report. Like the derived stores in Part 4 and Part 5, ephemeral data makes recovery automatic.
- Hot shard. A stadium emptying after a concert concentrates thousands of drivers and requests onto one shard - the geospatial echo of a hot key. Mitigate with adaptive cell sizing and dynamic splitting of overloaded shards.
- Matching service down. New requests queue or fail and riders retry; trips already in progress are unaffected because the trip service is separate.
- Claimed driver app crashes. The claim's TTL expires and the reservation is released or the trip reassigned.
Multi-region
This is the cleanest multi-region story in the series: geography is the partition key. A rider in Paris is only ever matched against drivers in Paris, so each city's index and matching operate independently, and cross-region matching essentially never happens. The geographic shard key and the deployment regions are the same thing. Trip records are persisted in the trip's own region.
Evolution path
| Stage | Approach |
|---|---|
| Launch | One in-memory geohash index, greedy nearest-driver matching |
| Growth | Shard the index by region, stream locations over persistent connections |
| Scale | Adaptive cell sizing, dynamic hot-shard splitting, batch matching, H3/S2 cells |
Build the geospatial index abstraction and the atomic driver claim from day one - both are central and hard to retrofit. Defer batch matching and adaptive sharding until density demands them.
Observability
Track the location update rate, index query p99, end-to-end matching latency (request to match), match success rate, candidate-set size, per-shard load (for hot-shard detection), claim contention rate, and the unmatched-request rate. A reasonable SLO: 99% of ride requests matched within 5 seconds.
Step 6 - Bottlenecks and Trade-offs
- The location firehose is the dominant load, absorbed by an in-memory, geographically sharded, non-durable index.
- Hot shards - stadiums, downtown at rush hour - need adaptive cell sizing and dynamic splitting.
- The geohash edge problem forces every nearby query to scan the cell plus its eight neighbours.
- Double-booking is prevented only by an atomic claim - the lone strongly-consistent operation.
- Match quality versus latency is the standing trade-off: greedy nearest is instant, batch matching is better but adds a brief delay.
Reference Architecture
The pattern this problem teaches, reusable well beyond ride-sharing:
A high-write-rate, in-memory geospatial index sharded by geography answering radius queries - ephemeral, so it self-heals - paired with an atomic claim on the scarce resource as the one strongly-consistent operation.
flowchart LR
subgraph Firehose["Write firehose - eventually consistent"]
F1[Location updates] --> F2[(Geospatial index<br/>in-memory, geo-sharded)]
end
subgraph Claim["Match - strongly consistent"]
C1[Nearby query] --> C2[Atomic claim on driver]
end
F2 --> C1Figure 4. The reference architecture distils the design into two halves with opposite consistency stances. The write firehose is eventually consistent because a lost update self-corrects within seconds; the match is strongly consistent because a double-booked driver is a real-world failure. Naming both stances explicitly is what makes this design honest.
The same shape recurs in any "find nearby X" system: food delivery dispatch, store and ATM locators, nearby-friends features, fleet and asset tracking, geofencing. A spatial index over a fast-moving point set, sharded by geography, with an atomic claim wherever a real-world resource must be assigned exactly once.
Common Mistakes in the Interview
- No spatial index - scanning all drivers and distance-checking each, an O(N) dead end.
- Ignoring the geohash edge problem, so an edge-of-cell rider misses their true nearest driver.
- Persisting every location update durably, paying for durability that ephemeral data does not need.
- No atomic claim, allowing one driver to be matched to two riders.
- Ignoring hot shards - the stadium-emptying scenario that overloads one region.
- Treating the location index as a source of truth needing durability and strong consistency.
- A single global index instead of sharding by geography.
Quick Reference
| Topic | Key Point |
|---|---|
| Core pattern | Geospatial index (geohash) sharded by geography + an atomic driver claim |
| Nearby query | Read the rider's cell plus all 8 neighbours, then filter by true distance |
| Geohash vs quadtree | Geohash for moving objects; quadtree for static, uneven density |
| Location firehose | In-memory, non-durable, geo-sharded; a missed update self-corrects |
| Matching | Rank nearby candidates; atomically claim one (available -> reserved) |
| Claim | Compare-and-set with a TTL; the one strongly-consistent operation |
| Consistency | Locations eventually consistent; the claim and trip strongly consistent |
| Failure recovery | Index node loss self-heals in seconds as drivers re-report |
| Hot shards | Adaptive cell sizing and dynamic shard splitting for dense areas |
| Multi-region | Geography is the shard key; cities run independently |
Related Articles
- System Design Interview Problems: A Senior's Roadmap - the full series index and pattern library.
- System Design Interview Guide: The 6-Step Framework - the method this walkthrough applies.
- Design a Distributed Cache - Part 4; sharding and the rebuildable, ephemeral-store idea.
- Design a Chat System - Part 6; the persistent connection layer reused for location and trip tracking.
- Design a Web Crawler - Part 8; another sharded, partition-by-key system with self-healing state.
- Redis Interview Questions - native geospatial commands and in-memory indexing.
This is Part 7 of a 12-part system design series where each post solves one problem around one core pattern. Next: Design a Web Crawler.
