Design a URL Shortener: System Design Interview 2026

·17 min read
system-designcache-asideurl-shortenerarchitecturebackendinterview-preparation

Services like Bitly and TinyURL have collectively shortened tens of billions of URLs, and a single short link embedded in a viral post can absorb millions of redirects in an hour. The problem looks trivial - map a long string to a short one - which is exactly why it is a favourite opener in senior system design interviews. The interesting decisions are not "how do I store a key-value pair" but how you generate identifiers without a global bottleneck, how you keep p99 redirect latency low under a 100:1 read skew, and how the system degrades when the cache or the ID allocator fails.

This walkthrough assumes you already know the 6-step system design framework. We apply it here to one problem, with the depth an interviewer expects from a senior or staff candidate.

Table of Contents

  1. The Problem
  2. Step 1 - Clarify Requirements
  3. Step 2 - Estimate Scale
  4. Step 3 - API and Data Model
  5. Step 4 - High-Level Design
  6. Step 5 - Deep Dive: ID Generation and Cache-Aside
  7. Step 6 - Bottlenecks and Trade-offs
  8. Reference Architecture
  9. Common Mistakes in the Interview
  10. Quick Reference
  11. Related Articles

The Problem

We are designing a service that accepts a long URL and returns a short one, then redirects anyone who visits the short URL to the original. The canonical examples are Bitly, TinyURL, and the link shorteners built into social platforms.

What makes this a senior problem is not the data model - it is a single key-value mapping - but the asymmetry of the workload. Writes are rare and can tolerate tens of milliseconds of latency. Reads are constant, latency-sensitive, and spiky, because any one link can go viral independently of the rest. A good answer treats the read path and the write path as two almost separate systems that happen to share a store.


Step 1 - Clarify Requirements

The first signal an interviewer looks for is whether you scope the problem before designing it. State your assumptions out loud and let the interviewer correct them.

Functional requirements:

  • Create a short URL from a long URL.
  • Redirect a short URL to its original long URL.
  • Support optional custom aliases (example.com/my-brand).
  • Support optional link expiration.

Out of scope for the core design (call these out, then defer them): user accounts, click analytics dashboards, and editing a link's destination. We will note where analytics changes the design, because it does.

Non-functional requirements:

  • Read-heavy: assume a read/write ratio around 100:1.
  • Low-latency redirects: p99 under ~50 ms, because a redirect sits in the critical path of a page load.
  • High availability for reads: a failed redirect is a broken link on someone else's site. Redirect availability matters more than creation availability - a useful asymmetry to state explicitly.
  • Codes must not be trivially enumerable: sequential codes let anyone scrape every link and infer your traffic. Treat this as a requirement, not a nice-to-have.

A subtle clarifying question worth asking: should shortening the same long URL twice return the same code? Deduplicating saves storage but breaks per-link analytics and per-link expiration. The defensible default is one new code per creation request; mention the trade-off rather than silently picking one.


Step 2 - Estimate Scale

Back-of-envelope numbers justify every later decision. Make the arithmetic visible.

Write throughput. Assume 100 million new links per day.

  • Average: 100M / 86,400 s ≈ ~1,200 writes/sec.
  • Peak (assume 5x): ~6,000 writes/sec.

Read throughput. At a 100:1 ratio, 10 billion redirects per day.

  • Average: ~115,000 reads/sec.
  • Peak: ~500,000 reads/sec.

Storage. Each record - long URL, short code, owner, timestamps, expiry - is roughly 500 bytes.

  • 100M/day x 500 bytes ≈ 50 GB/day~18 TB/year.
  • Over 10 years, on the order of 180 TB. This needs partitioning, but it is not exotic.

Code length. With a base62 alphabet (a-z, A-Z, 0-9):

LengthKeyspaceLifetime at 100M/day
6 chars62^6 ≈ 56.8 billion~1.5 years - too short
7 chars62^7 ≈ 3.5 trillion~96 years
8 chars62^8 ≈ 218 trillionmillennia, but longer links

Pick 7 characters. This is a one-way decision: you cannot lengthen existing codes later without breaking every short link in the wild, so size for decades on day one.

Cache sizing. Reads follow a Pareto distribution - a small fraction of links drives most traffic. Caching the hottest ~100M mappings at ~500 bytes each is ~50 GB, which fits comfortably in a modest Redis cluster. The cache does not need to hold everything, only the working set.


Step 3 - API and Data Model

Two endpoints carry the system.

POST /api/urls
  body: { "longUrl": "...", "customAlias": "...", "expiresAt": "..." }
  201:  { "shortUrl": "https://sho.rt/aZ3kP9x" }
 
GET /{shortCode}
  301 or 302  Location: <longUrl>
  404         unknown or expired code

The data model is a single logical table keyed by the short code:

ColumnNotes
short_codePrimary key, 7-char base62 string
long_urlThe destination
created_atCreation timestamp
expires_atNullable; drives TTL-based expiry
owner_idNullable; for future account features

Because lookups are always by exact short_code, this does not need a relational engine. A horizontally partitioned key-value store (DynamoDB, Cassandra, or a sharded relational table partitioned by short_code) is the natural fit, and partitioning by the primary key spreads both reads and writes evenly.

301 vs 302 is a design decision, not a detail. A 301 Moved Permanently is cached by browsers and proxies, so repeat visits never reach your servers - this slashes read load but means your analytics see only the first click per client. A 302 Found forces every click back to you, which is mandatory if click tracking is a product requirement. State which you choose and why. For a shortener whose business model is analytics, 302 with a short Cache-Control is the usual answer.


Step 4 - High-Level Design

The architecture separates the heavy read path from the light write path. They meet only at the store.

flowchart TD
    Client([Client])
    CDN[CDN / Edge Cache]
    LB[Load Balancer]
    ReadSvc[Redirect Service<br/>stateless]
    WriteSvc[Shorten Service<br/>stateless]
    Cache[(Distributed Cache<br/>Redis)]
    DB[(Partitioned KV Store)]
    Alloc[ID Allocator]
 
    Client -->|GET /code| CDN
    CDN -->|miss| LB
    LB --> ReadSvc
    ReadSvc -->|1. lookup| Cache
    ReadSvc -->|2. on miss| DB
    Client -->|POST /api/urls| LB
    LB --> WriteSvc
    WriteSvc -->|get ID block| Alloc
    WriteSvc -->|persist| DB
    WriteSvc -->|warm cache| Cache

Figure 1. The high-level architecture, with the read path (client to CDN to redirect service to cache to store) and the write path (client to shorten service to ID allocator to store) deliberately drawn as disjoint flows. They share only the cache and the store, which is what lets each side scale capacity independently - and what makes the ID Allocator's absence from the read path the single most consequential placement on the diagram.

Every service tier is stateless and horizontally scalable behind the load balancer; capacity is just a question of adding instances. The two stateful components - the cache and the partitioned store - are where the real design lives, and the ID Allocator deliberately sits off the read path entirely.


Step 5 - Deep Dive: ID Generation and Cache-Aside

This is the core of the problem. Two patterns carry it: how identifiers are generated without a global bottleneck, and how the read path stays fast and cheap because the data is immutable.

Part A - Generating short codes

There are three viable strategies, and a senior answer compares them quantitatively rather than naming one.

Option 1 - Hash the URL. Take base62(SHA256(longUrl))[:7]. It is stateless and fast, but two different URLs can collide in 7 characters, so every write must read the store to check whether the code is taken and retry on collision. By the birthday bound, collisions become frequent as the keyspace fills, turning every write into a read-modify-write loop. Hashing also makes identical URLs collapse to one code, which silently breaks per-link analytics and expiry.

Option 2 - A global counter. Keep a single auto-increment integer and base62-encode it. Uniqueness is free and there are no collisions, but a single counter is a write bottleneck, and - more importantly - the output is monotonic. Sequential codes are an enumeration vulnerability: a competitor can walk every link, count your daily growth, and harvest destinations. Treat raw sequential codes as disqualifying.

Option 3 - Counter with block allocation (recommended). Keep the counter, but never touch it per request. A central allocator hands each application server a contiguous block of IDs - say 10,000 at a time. The server encodes IDs from its block locally and only returns to the allocator when the block is exhausted.

sequenceDiagram
    participant W as Shorten Service
    participant A as ID Allocator
    participant D as KV Store
 
    W->>A: request ID block
    A-->>W: block [4,500,000 - 4,509,999]
    Note over W: serve ~10,000 writes locally
    W->>D: persist (code, longUrl)
    Note over W: block exhausted
    W->>A: request next block

Figure 2. Block allocation amortises a single network round trip across thousands of writes. The Shorten Service serves IDs from its current block locally, with no coordination; only when the block is exhausted does it pay the cost of asking the allocator for a new range. This is the mechanism that pulls ID generation off the per-write critical path while keeping uniqueness globally guaranteed.

The quantitative trade-off is the block size. A 10,000-ID block means the allocator is hit once per ~10,000 writes - at 6,000 writes/sec that is roughly one allocator call every 1.7 seconds across the fleet, which any store handles trivially. The cost is that a server restart wastes the unused tail of its block. With a 7-character keyspace of 3.5 trillion, wasting a few thousand IDs per restart is irrelevant; with a smaller keyspace it would matter. Larger blocks waste more on restart, smaller blocks increase allocator traffic - 10k is a reasonable middle.

Defeating enumeration. Block allocation still produces locally monotonic IDs. Before base62-encoding, run the integer through a reversible permutation - a Feistel network or a fixed secret multiplication modulo the keyspace. The mapping stays bijective (still collision-free, still reversible if needed) but the public codes look random. This is the senior nuance most candidates miss: you can have both guaranteed uniqueness and non-guessable codes.

Custom aliases bypass ID generation entirely - they are user-supplied strings written directly as the primary key, which the store rejects on conflict. Keep them in the same keyspace and they need no special handling beyond a uniqueness check.

Part B - The read path: cache-aside on immutable data

The redirect path is a textbook cache-aside lookup: check the cache, and on a miss read the store and populate the cache.

flowchart TD
    Start([GET /shortCode]) --> C{In cache?}
    C -->|hit| Return[Return redirect]
    C -->|miss| DB{In store?}
    DB -->|found| Fill[Write to cache] --> Return
    DB -->|not found| Neg[Cache negative result] --> NF[Return 404]

Figure 3. The cache-aside read path. A hit returns immediately; a miss falls through to the store and back-fills the cache for the next reader. The branch that caches the not-found result is what stops scanning bots from hammering the store with random codes - negative caching turns a flood of invalid lookups into cheap cache hits.

What makes this pattern so effective here is a property of the data: a short-code-to-URL mapping is immutable. Once created, the destination never changes. The hardest problem in caching - invalidation - effectively does not exist. There is no write-back, no stale-data race, no cross-node invalidation broadcast. The only mutations are expiration and deletion, and both are handled by setting a TTL on the cache entry. Whenever you can frame a system around immutable values, say so explicitly in the interview: it removes an entire class of failure modes.

Latency budget. A cache hit is ~1-2 ms; a miss adds a store read of ~10-20 ms. At a 90%+ hit ratio the p99 is dominated by the miss path, so p99 ≈ store latency. This is why a redirect can comfortably meet a sub-50 ms p99 SLO, and why pushing the hit ratio higher is the main lever for tail latency.

Consistency model. The system is eventually consistent on the write-to-read path. A code persisted to the store's primary may not yet be on a replica when the first redirect arrives, producing a spurious 404. Because the data is immutable, the fix is clean: on creation, write the mapping into the cache synchronously (a write-through only at creation time). The code is then resolvable from the cache the instant the API returns, independent of store replication lag. Reads of pre-existing links remain read-your-writes safe because, again, those values never change.

Negative caching. Scanning bots probe random codes and generate a stream of misses that would otherwise hammer the store. Cache the "not found" result with a short TTL so a flood of invalid lookups collapses onto cheap cache hits.

Hot keys. A viral link concentrates traffic on the single cache shard that owns its code - a hotspot the rest of the cluster cannot relieve. Three mitigations stack:

  • Edge caching. A 301, or a 302 with a short Cache-Control, lets the CDN and browsers absorb most repeat clicks before they reach you.
  • In-process LRU. A small per-server local cache serves the few hottest codes from memory with zero network hops.
  • Hot-key replication. Replicate identified hot keys across several shards and have clients pick a shard at random, spreading the load.

Step 6 - Bottlenecks and Trade-offs

A senior answer names how the system fails and how it degrades, not just how it works on the happy path.

Failure modes.

  • Allocator down. Every application server keeps serving writes from its current block; the allocator is only needed when a block runs out. A few minutes of allocator downtime is invisible. Critically, the allocator is not on the read path at all - redirects are unaffected.
  • Cache down. All ~115k reads/sec fall through to the store. Survive this with store read replicas and request coalescing (single-flight): when 10,000 concurrent requests miss on the same key, only one query goes to the store and the rest wait on its result. Without coalescing, a cache restart can trigger a thundering herd that takes down the store.
  • Store primary down. Reads continue from replicas - safe, because the data is immutable. Writes pause or queue; given the asymmetry we already accepted, paused link creation is far less damaging than failed redirects.

Multi-region. Redirects must be fast worldwide, so the mapping store is replicated to every region (read replicas, or a globally replicated store such as DynamoDB global tables). To avoid cross-region coordination on writes, partition the ID keyspace by region: each region's allocator owns a disjoint range, so two regions can never mint the same code without ever talking to each other. Writes are local, replication is asynchronous, and the immutability of the data means asynchronous replication carries no correctness risk.

Evolution path. Do not design the 100M/day system for a launch with 1,000 users.

StageArchitecture
Launch (~1k/day)Single store, encoding in the app, no cache
Growth (~1M/day)Add cache-aside, store read replicas
Scale (100M+/day)Block-allocating ID service, CDN edge caching, partitioned store, multi-region

Build from day one what is expensive to change later: the 7-character code length and the abstraction of an ID allocator - even if it is initially just a database sequence. Defer multi-region and custom partitioning until traffic demands them. Knowing what not to build yet is itself a senior signal.

Analytics write amplification. Switching to 302 for click tracking turns every redirect into an analytics event - 10 billion writes/day. Never write those synchronously on the redirect path. Emit them to a message queue and process them asynchronously, or sample, so analytics never adds latency to or risk to the redirect.


Reference Architecture

The pattern this problem teaches, reusable far beyond URL shorteners:

An immutable key-value lookup served through cache-aside, with identifier generation pulled completely off the request's critical path.

flowchart LR
    subgraph Read["Read path - hot, latency-critical"]
        direction TB
        R1[CDN edge cache] --> R2[Stateless service]
        R2 --> R3[(Distributed cache)]
        R3 --> R4[(Replicated store)]
    end
    subgraph Write["Write path - cold, latency-tolerant"]
        direction TB
        W1[Stateless service] --> W2[Block ID allocator]
        W1 --> W3[(Store primary)]
    end
    Write -.async replication.-> Read

Figure 4. The reference architecture distils the design into two paths separated by their tolerance for latency. The hot read path is cached, replicated, and latency-critical; the cold write path holds the only coordination point (the block ID allocator) and is permitted to be slow. Because the data is immutable, the dashed asynchronous-replication arrow between them carries no correctness risk - which is what makes this entire shape applicable to any immutable-value system.

Whenever you meet a system whose stored values never change after creation - audit logs, content-addressed blobs, immutable event records - the same shape applies: aggressive caching is nearly free, replication can be asynchronous without risking correctness, and the scarce coordination (here, ID allocation) should be batched away from individual requests.


Common Mistakes in the Interview

  • Proposing a raw auto-increment counter without naming the enumeration and privacy leak it creates.
  • Putting ID generation synchronously on the database for every write - a self-inflicted write bottleneck that block allocation removes.
  • Treating cache invalidation as hard here. Missing that the data is immutable, so invalidation is a non-problem, wastes interview time and signals shallow analysis.
  • Ignoring 301 vs 302 and its direct impact on both server load and analytics fidelity.
  • Choosing a code length by feel instead of doing the keyspace-versus-lifetime arithmetic - and not recognising that code length is a one-way decision.
  • Forgetting the hot-key problem. Uniform-load assumptions break the moment one link goes viral.
  • Designing only the happy path - no story for a cache restart, a thundering herd, or allocator downtime.

Quick Reference

TopicKey Point
WorkloadRead-heavy, ~100:1; design read and write paths separately
Code generationBase62 counter with block allocation; permute before encoding
Code length7 chars (62^7 ≈ 3.5T); a one-way decision, size for decades
Read pathCache-aside; invalidation is trivial because mappings are immutable
ConsistencyEventually consistent; write-through to cache on creation closes the gap
Hot keysCDN/edge cache + in-process LRU + hot-key replication
Cache outageRequest coalescing + read replicas prevent a thundering herd
301 vs 302301 = less load, weaker analytics; 302 = full analytics, more load
Multi-regionPartition the ID keyspace per region; replicate the store asynchronously
ObservabilityRedirect p99, cache hit ratio, ID block burn rate, 404 rate, replication lag

This is Part 1 of a 12-part system design series where each post solves one problem around one core pattern. Next: Design a Rate Limiter.

Ready to ace your interview?

Get 550+ interview questions with detailed answers in our comprehensive PDF guides.

View PDF Guides