Design a Gaming Leaderboard Service
Interview Experience
Design a Gaming Leaderboard Service Imagine you are building a backend service for game studios. Games run on players' computers, and after a game session ends the client wants to show two things ri
Full Details
Design a Gaming Leaderboard Service Imagine you are building a backend service for game studios. Games run on players' computers, and after a game session ends the client wants to show two things right away: 1. the player's personal best for that game 2. the game's top 100 leaderboard We only need to persist a submitted score if it sets a new personal best for that player or breaks into the game's top 100. That small requirement changes the whole traffic pattern: a newly launched game can generate a burst of writes, but mature games become much quieter because the leaderboard cutoff rises and most players no longer qualify. Prioritize availability over consistency. If a cache replica is slightly stale or one region is degraded, it is better to show a slightly old leaderboard and let clients retry score submission than to take the entire feature offline. The key interview insight is that this is not a "store every score forever" system. Because only personal-best improvements and top-100 candidates matter, both the client and the server can filter most writes before they hit durable storage. If the interviewer says top 10 instead of top 100, treat K as configurable. The architecture is the same; only the cutoff cache and leaderboard size change. Assume anti-cheat and score verification are handled upstream by trusted game servers or a separate validation service. The focus here is leaderboard storage, latency, and scaling. ## Phase 1: Requirements (~5 minutes) Start by clarifying scope. The interviewer hint here is usually "traffic is small at first" or "this is a startup." That is a signal to begin with the simplest working system, then evolve it. ### Functional Requirements 1.
Players should be able to fetch the current top 100 leaderboard for a game 2.
Players should be able to fetch their personal best for a game immediately after a session ends 3.
Clients should be able to submit a score candidate and learn whether it updated personal best, leaderboard, both, or neither 4.
The system should be able to safely handle retries and offline buffering so players can resubmit when service is back 5.
The system should be able to support many games, including bursts when a new title launches Do not over-scope the first pass. Matchmaking, achievements, friends, anti-cheat, and full score history are all separate systems unless the interviewer explicitly asks for them. ### Non-Functional Requirements | Requirement | Target | Rationale | | --- | --- | --- | |
Leaderboard read latency | P95 under 50 ms from cache | Game UI should feel instant | |
Personal-best read latency | P95 under 50 ms from cache | Usually shown right after the match ends | |
Score submission ack | P95 under 200 ms | Client should know quickly whether submission was accepted or queued | |
Availability | 99.95% | Prefer degraded service over hard failure | |
Consistency | Eventual consistency is acceptable | A few hundred milliseconds of staleness is fine | |
Durability for accepted scores | No lost accepted submissions | Once accepted, a retryable candidate must survive failures | Ask clarifying questions early: * "Do we need to store every raw score attempt, or only personal bests and top-K candidates?" Assume only the latter in the hot path. * "Does the client need strict read-after-write consistency after submission?" Assume near-realtime is enough, but call out how to tighten this if needed. * "Should the client submit every score?" No. The intended optimization is local filtering: only submit if the score beats the local personal best or current leaderboard cutoff. * "How do we break ties on the leaderboard?" Use a deterministic rule such as (score DESC, achieved_at ASC, submission_id ASC). * "What matters more: consistency or availability?" Availability wins. ### Capacity Estimation Use the traffic shape from the prompt, not generic social-media scale:
text Assumptions: - 10M daily active players across all games - 3 completed game sessions per player per day - 30M session ends/day = ~350 session ends/sec average - Peak factor during evenings / launches: 15x => ~5,000 session ends/sec Steady-state mature games: - Only ~2% of scores beat local PB or top-100 cutoff - Candidate write peak ~= 100 submissions/sec Launch-day hot game: - The leaderboard is not full yet, so nearly every finished session may submit - Worst-case write peak for one new game can approach the full 5,000 submissions/sec Reads: - Leaderboard and PB are read after many sessions and from game menus - Design for ~50,000 cache reads/sec peak across games
Hot data size is small enough to justify aggressive caching:
text Leaderboard cache: - 100K active games - 100 entries/game - ~64 bytes/entry raw payload - ~640 MB raw before overhead Personal-best cache: - 200M active (user_id, game_id) pairs - ~32 bytes per record raw - ~6.4 GB raw before overhead
The most important capacity observation is not the absolute QPS. It is that write traffic decays over time for mature games, while launches create bursty "all scores matter" behavior. That is why a queue plus cache-first reads is a strong answer. ## Phase 2: Data Model (~5 minutes) The two core queries are different: *
Leaderboard query: "show top 100 for game_id" *
Personal-best query: "show best score for (game_id, user_id)" Do not force one table to answer both efficiently. Treat them as separate read models. ### Core Entities
text Game ├── game_id: UUID ├── title: string ├── status: enum (prelaunch, live, retired) ├── launch_at: timestamp └── metadata: jsonb PersonalBest ├── game_id: UUID ├── user_id: UUID ├── best_score: bigint ├── best_submission_id: UUID ├── updated_at: timestamp └── version: bigint LeaderboardEntry ├── game_id: UUID ├── rank: int ├── user_id: UUID ├── score: bigint ├── submission_id: UUID ├── achieved_at: timestamp └── tie_break_key: string ScoreSubmission ├── submission_id: UUID ├── game_id: UUID ├── user_id: UUID ├── session_id: string ├── score: bigint ├── idempotency_key: string ├── client_seen_pb: bigint ├── client_seen_cutoff: bigint ├── status: enum (queued, applied, ignored, failed) ├── applied_reason: enum (personal_best, leaderboard, both, none) └── created_at: timestamp ScoreEvent ├── event_id: UUID ├── submission_id: UUID ├── game_id: UUID ├── user_id: UUID ├── event_type: enum (received, applied, ignored, retried) └── created_at: timestamp
Relationships
text Game 1:N LeaderboardEntry Game 1:N PersonalBest Game 1:N ScoreSubmission User 1:N PersonalBest User 1:N ScoreSubmission ScoreSubmission 1:N ScoreEvent
Hot Read Models For the interview, call out the actual cache keys:
text top100:{game_id} -> sorted leaderboard payload cutoff:{game_id} -> current 100th-place score pb:{game_id}:{user_id} -> player's best score leaderboard_version:{game_id} -> monotonic version for cache refresh / sync
The leaderboard is tiny. For each game, the exact top 100 can fit comfortably in memory. That means the hard problem is not storage size; it is coordinating updates during bursts and hot games. Be explicit that rank is derived from a deterministic sort order, not stored as an independently edited field. For example: sort by score DESC, then achieved_at ASC, then submission_id ASC. ## Phase 3: API Design (~5 minutes) ### Protocol Choices *
Public client API: REST over HTTPS *
Internal score processing: durable queue / log partitioned by game_id *
Optional live update path: WebSocket or server push if the game wants realtime leaderboard movement REST is the right default here. The key is idempotency, not fancy transport. ### Client-Facing APIs
http GET /v1/games/{game_id}/leaderboard?limit=100 Response: { "game_id": "game_123", "version": 8812, "cutoff_score": 984500, "entries": [{ "rank": 1, "user_id": "u1", "score": 1432000 }, { "rank": 2, "user_id": "u2", "score": 1419000 }] }
http GET /v1/games/{game_id}/players/me/best Response: { "game_id": "game_123", "user_id": "me", "best_score": 991000, "updated_at": "2026-03-16T18:23:10Z" }
http GET /v1/games/{game_id}/score-context Response: { "game_id": "game_123", "personal_best": 991000, "cutoff_score": 984500, "leaderboard_version": 8812, "entries": [... top 100 ...] }
This combined endpoint is practical because the client typically wants both values together right after a match.
http POST /v1/games/{game_id}/scores { "session_id": "sess_abc", "score": 1002000, "idempotency_key": "game_123:sess_abc:attempt_1", "client_seen_pb": 991000, "client_seen_cutoff": 984500 } Response: { "submission_id": "sub_789", "status": "queued", "likely_updates": { "personal_best": true, "leaderboard": true }, "current_context": { "personal_best": 991000, "cutoff_score": 984500, "leaderboard_version": 8812 } }
Include idempotency_key. The client may keep pending qualifying scores locally and retry after reconnect. Without idempotency, duplicate submissions are guaranteed. If the interviewer insists on exact read-after-write semantics in the POST /scores response, upgrade the async processor into a per-game synchronous owner service that appends to a durable log and updates the cache before ACK. The rest of the design remains the same. ## Phase 4: High-Level Design (~15-25 minutes) ### Start Simple First Because the prompt explicitly hints low initial traffic, start with the simplest design that works: * one API service * one relational database * two indexed tables or materialized views: * personal_best(game_id, user_id) * leaderboard_entries(game_id, rank) That is enough for a startup or a private beta. Then say how you would evolve it once reads and launches grow: ### Scaled Architecture ### Component Responsibilities
Game client * Caches the last seen personal best and leaderboard cutoff locally * Submits only if the new score beats one of those thresholds * Buffers pending qualifying submissions locally if the service is down
Read API * Serves leaderboard and personal-best reads from cache * Falls back to DB snapshots only on cache miss or rebuild
Score Submit API * Validates idempotency key and request shape * Re-checks cached PB and cutoff because the client may be stale or malicious * Rejects obvious non-qualifying scores without touching the database * Appends candidate submissions to the durable log for replayable processing
Durable Log / Queue * Acts as the source of truth for accepted candidate submissions * Partitions by game_id so updates for the same game stay ordered * Absorbs launch spikes and supports replay if cache or DB projections fall behind
Leaderboard Processor * Consumes submissions from the durable log partitioned by game_id * Maintains one logical ordered updater per game partition * Updates top100, cutoff, and personal_best caches * Materializes snapshots/submissions into the primary database
Primary DB * Stores materialized snapshots and submission audit records * Rebuilds cache if Redis is lost and recent log segments are replayed ### Read Path Right after a match ends, the game client usually needs both the personal best and the leaderboard: 1. Client calls GET /score-context 2. Read API fetches pb:{game_id}:{user_id} and top100:{game_id} from cache 3. Response returns in tens of milliseconds without hitting the database This is why "read from cache, not DB" is the right default answer. ### Submission Path 1. Client compares the new score against locally cached personal best and cutoff 2. If the score cannot matter, the client does not submit 3. If it might matter, client calls POST /scores with idempotency key 4. Submit API re-checks against authoritative cache 5. If the score still cannot affect PB or top 100,
return ignored 6. Otherwise append the submission to the durable log 7. The leaderboard processor updates: * player's personal best if the score is higher * game's top 100 if the score clears the cutoff * cutoff score and leaderboard version 8. Downstream materializers persist the applied change to DB snapshots and audit tables; because the log is durable, the DB can lag briefly without losing accepted submissions Separate "can this score matter?" from "store this score forever." Most scores should die in cache-level filtering and never become durable writes. Stale local state is acceptable here. If the client has an old cutoff or personal best, it may submit a few extra candidates, but the server-side cache check keeps correctness intact. ### Why A Queue Helps The queue is not mainly about average traffic. It is about launch-day burst absorption. When a new game launches: * the leaderboard may have fewer than 100 entries * nearly every early score qualifies * writes spike sharply for a short time The queue gives you: * burst absorption * natural partitioning by game_id * retry and backoff during DB or cache hiccups * room to scale processors independently from APIs ### Keeping The UX "Immediate" There are two acceptable answers depending on how strict the interviewer is:
Availability-first version * POST /scores durably queues the candidate * cache updates appear within sub-second lag * client immediately reads updated context from cache
Stronger consistency version * a per-game owner service appends to a durable log and updates cache before ACK * higher complexity, lower staleness Given the prompt's emphasis on availability over consistency, start with the first answer and then mention the second as an upgrade path. ## Phase 5: Scaling & Trade-offs (~15-20 minutes) ### 1. Why Write Traffic Drops Over Time This is one of the best insights in the prompt: * once a game has a meaningful top 100, the cutoff rises * most players stop qualifying * clients stop submitting losing scores * server-side filtering removes stale or incorrect candidates So the system naturally becomes: * read-heavy for mature games * write-bursty for new games That is why cache-first reads and queue-buffered writes are a strong fit. ### 2. Scaling Personal Best And Leaderboard Separately These are different workloads: * PersonalBest is keyed by (game_id, user_id) and can be horizontally partitioned naturally * LeaderboardTop100 is a tiny per-game ranked set and should be cached as a single object Do not try to answer both queries with one giant table and one query plan. Practical scaling strategy: * shard PersonalBest by hash of (game_id, user_id) * keep LeaderboardTop100 in Redis sorted sets or processor memory with DB snapshots behind it * serve both from cache through separate read models This directly addresses the interviewer
follow-up these two queries want different storage layouts. ### 3. Hot Game And Hot Partition Strategy This is usually the hardest follow-up. The bad answer is: * split one hot game's data into game-1, game-2, game-3 * then try to join those shards on every leaderboard read That makes exact realtime ranking awkward because the read path now has to merge multiple partial leaderboards. A better answer is: 1.
Keep one logical owner for the exact top 100 of a game 2.
Scale around that owner, instead of splitting the exact ranking state blindly Concretely: * assign hot games dedicated partitions or machines * let clients pre-filter using local PB and cutoff * let submit APIs pre-filter again using cache * if traffic is still too hot, add hierarchical top-K filtering With hierarchical top-K: * each ingress shard keeps a local candidate heap for the hot game * only scores above a moving threshold are forwarded to the game's global owner * the global owner merges shard top-K streams and maintains the exact top 100 That avoids suffix-based sharding on the read path while still reducing pressure on the exact owner. If the interviewer asks "what if one game becomes insanely hot?", say: "I would not shard the exact global leaderboard and merge on every read. I would keep one exact owner for top 100 and use pre-filtering or hierarchical top-K before it." ### 4. Availability Over Consistency Be explicit about the trade-off: * Reads come from cache, so they may be slightly stale * Score submission is idempotent and retryable * If Redis has a partial outage, clients can keep pending scores locally and retry later * If the snapshot database is slow, the durable log can absorb accepted submissions while projection workers catch up later This is acceptable because: * users care more that the feature works than that rank 98 and rank 99 swap instantly * the leaderboard is not a bank account ### 5. Failure Handling
Client offline or service unavailable * client stores pending candidate scores locally * retry later with the same idempotency key
Queue backlog * processors autoscale by partition count * hot games get dedicated workers
Redis loss * rebuild from DB snapshots and recent log replay * serve degraded responses while warming cache
Duplicate submissions * dedupe by (game_id, user_id, session_id, idempotency_key)
Out-of-order retries * keep submission versions or timestamps * only raise personal best if the new score is strictly larger than the stored best ### 6. Why Not Read From The Database Directly? Because the main queries are tiny but very frequent: * top 100 for a game * one user's personal best If every game-over screen hits the DB directly, read traffic grows with overall session volume. Cache avoids that and keeps p95 latency low. The database should be the durable snapshot and rebuild layer, not the main serving layer for hot reads. In the scaled design above, accepted submissions land in the durable log first and are then materialized into the database. ## Common Pitfalls
Storing every score in the hot path: this ignores the prompt's key optimization. Only personal-best and top-K candidates should enter the durable processing path; most other scores should be filtered out immediately.
Using one schema for both leaderboard and personal-best reads: these are different access patterns and want different read models. Sharding a hot game's exact leaderboard into gameId-1, gameId-2, ... and merging on every read: this makes exact realtime top-K much harder than necessary.
Serving leaderboard reads from the primary database: this creates unnecessary latency and load for a cache-friendly workload.
Ignoring retries: if clients buffer scores locally during outages, idempotency is mandatory. ## Interview Checklist ### Requirements Phase * [] Clarified that the hot features are personal best and top 100 leaderboard * [] Confirmed availability is more important than strict consistency * [] Called out the low-traffic startup starting point before scaling ### Design Phase * [] Separated personal-best and leaderboard read models * [] Put reads on cache instead of DB * [] Used a queue or durable log to absorb launch-day spikes * [] Added client-side and server-side filtering for non-qualifying scores ### Scaling Phase * [] Explained why mature games become write-light * [] Covered hot-game handling without breaking exact ranking reads * [] Mentioned idempotent retries and local client buffering * [] Discussed dedicated partitions or hierarchical top-K for extreme hotspots ## Summary | Aspect | Decision | Why | | --- | --- | --- | |
Public API | REST | Simple and sufficient for score fetch + submit | |
Read path | Cache-first | Personal best and top 100 are tiny, hot reads | |
Write path | Queue-buffered candidate submissions | Smooths launch spikes and supports retries | |
Core optimization | Submit only PB/top-100 candidates | Eliminates most writes | |
Personal best scaling | Partition by (game_id, user_id) | Natural horizontal sharding | |
Leaderboard scaling | One exact owner per game + dedicated hot partitions | Preserves exact top 100 without read-path merges | |
Failure handling | Idempotent retries + local client buffer | Supports availability-first behavior | ## Interview Experience Candidates reported that the interviewer often started with a deliberately small-scope version of the problem: "Assume the user count is low. How would you build it first?" A strong answer is to begin with one machine and one database, then evolve the design only when the interviewer pushes on scale. The follow-up usually centers on two points: 1. personal-best and leaderboard queries should not share the same storage strategy 2. naive hot-game sharding creates a painful merge problem on the read path The strongest way to close is to say that you would keep one exact owner for a game's top 100, aggressively pre-filter candidates before they reach that owner, and reserve hierarchical top-K aggregation for truly extreme hotspots.
About This Question
This is a candidate experience report from a reddit interview for a swe role reported in 2025.
It covers the following topics: Heap, Queue, Sql, Stack, Strings, System Design .
Difficulty rating: Easy
More Reddit Interview Questions
About Reddit Interview Reports
This question was reported by a candidate who interviewed at Reddit. LeakCode aggregates interview reports from 10+ sources, including 1Point3Acres, Glassdoor, LeetCode Discuss, Blind, Reddit, Indeed, and Nowcoder. Each report is translated where necessary, deduplicated against existing entries, and tagged by company, role, round type, and reporting date.
Use this question as one calibration data point, not a memorization target. Companies typically rotate their question pools every 2-4 months; the exact wording of a 2024 question may differ from what you encounter today. The underlying pattern, difficulty level, and follow-up depth at Reddit are the higher-signal extractions to take from this report.
For broader preparation context, the Reddit interview process typically includes a recruiter screen, one or two technical phone screens, and a 4-5 round on-site loop covering coding, system design (at L4+ levels), and behavioral. Reports tagged on LeakCode show the round-by-round distribution and typical difficulty calibration. To browse questions filtered by round type and seniority, use the company hub linked above.
How To Practice This Type of Question
Solve similar problems on LeetCode under timed conditions (25-35 minutes per medium difficulty). The goal is pattern recognition: recognize the underlying technique (sliding window, two-pointer, BFS, memoized recursion, etc.) within 60-90 seconds of reading. Strong candidates verbalize their hypothesis out loud before coding, then iterate based on feedback. Weak candidates dive into implementation immediately, lose time on the wrong approach, and run out of time for follow-ups.
Companies update their question pools every 2-4 months. The exact wording of any given question may have been retired by the time you interview. Focus your prep on the pattern, not the specific problem. The patterns that appear in Reddit reports consistently are the ones worth investing in; one-off niche problems are not.
During Your Reddit Round
Apply the standard interview round template: clarify requirements (2-3 minutes), state your approach out loud and confirm direction with the interviewer (3-5 minutes), code with narration (15-25 minutes), test with concrete examples including edge cases (5 minutes), discuss optimization or trade-offs if time permits (5 minutes). This template is universally accepted across FAANG and adjacent companies; deviating from it produces weaker interviewer feedback signal.
The single most predictive failure mode in Reddit reports tagged "no hire": not asking clarifying questions. Interviewers are explicitly trained to weight this. Strong candidates ask 3-5 clarifying questions even on problems that look obvious; weak candidates dive into code immediately. The clarifying-question check is often the first signal recorded in the interviewer's written notes.