DoorDash Data Scientist Interview Questions
6+ questions from real DoorDash Data Scientist interviews, reported by candidates.
Round Types
Top Topics
Questions
Completed last week: I got 5 SQL and 2 Python questions and successfully answered all of them, but I still didn’t move forward to the next round. No idea why—maybe the interviewer just didn’t like me?
## Round 1 - Data Science Case Study ## Problem DoorDash is piloting a bike courier program ("Biker Dashers") in a dense urban market. You have one week of delivery data. Evaluate whether the program is performing well and recommend whether to expand, hold, or cut the program. **Data Available:** - `deliveries(delivery_id, dasher_id, dasher_type, pickup_time, delivery_time, distance_km, order_value, tip)` - `dashers(dasher_id, dasher_type, city, start_date)` **Your analysis should cover:** 1. **Efficiency metrics:** Average delivery time by dasher type (bike vs. car). Is bike faster for short distances? 2. **Coverage gap:** What fraction of deliveries are beyond typical bike range (e.g. > 3 km)? 3. **Economics:** Compare earnings-per-hour for bike vs. car dashers including tips. Are bikers fairly compensated? 4. **Customer impact:** Is there a significant difference in delivery time variance? Do customers of bikers rate orders differently? ``` Sample insight: Bike Dashers: avg 18 min, car Dashers: avg 24 min for distances < 2km Bike coverage: 68% of all deliveries in pilot zone are < 2km -> Recommendation: expand in high-density zones ``` ## Follow-ups 1. What confounding variables might make bike dashers look artificially faster? 2. How would you run an A/B test to cleanly isolate the impact of bike vs. car assignment? 3. How would weather data change your analysis? 4. What guardrail metrics would you monitor to catch problems early in an expansion?
## Round 1 - Data Science Case Study ## Problem DoorDash receives a spike in customer complaints about food arriving cold. You are asked to diagnose the root cause and propose data-driven solutions. **Data Available:** - `deliveries(delivery_id, dasher_id, restaurant_id, pickup_time, delivery_time, food_prep_ready_at, distance_km, weather)` - `ratings(delivery_id, overall_score, food_temp_score, comments)` - `restaurants(restaurant_id, category, avg_prep_time_min)` **Your analysis should cover:** 1. **Wait time at restaurant:** How long does the dasher wait after food is ready? Longer waits = colder food at pickup. 2. **Transit time:** Is delivery distance or traffic the dominant factor? 3. **Segmentation:** Are certain restaurant categories (e.g. sushi, pizza) more affected? 4. **Correlation:** Does `food_temp_score` correlate with `(delivery_time - food_prep_ready_at)`? ``` Key metric: "food exposure time" = delivery_time - food_prep_ready_at Hypothesis: food_temp_score drops ~0.3 points per 5 extra minutes of exposure ``` ## Follow-ups 1. How would you distinguish "cold at pickup" vs. "cooled during transit"? 2. What operational levers does DoorDash have to reduce cold food? Which would you prioritize? 3. How would you design an experiment to test whether insulated bags reduce cold complaints? 4. What metric would you track as a leading indicator before customers complain?
## Round 1 - SQL ## Problem You have the following DoorDash schema: ```sql CREATE TABLE customers (customer_id INT, name VARCHAR, city VARCHAR, joined_date DATE); CREATE TABLE orders (order_id INT, customer_id INT, restaurant_id INT, order_total DECIMAL, placed_at TIMESTAMP); ``` **Task 1:** Find all customers who placed orders on at least 5 distinct days in the past 30 days. **Task 2:** For each city, find the customer with the highest number of orders this month. Break ties by customer_id ascending. **Task 3:** Find customers who ordered from the same restaurant at least 3 times. Return customer_id, restaurant_id, and order_count. ```sql -- Task 1 skeleton: SELECT customer_id FROM orders WHERE placed_at >= CURRENT_DATE - INTERVAL '30 days' GROUP BY customer_id HAVING COUNT(DISTINCT DATE(placed_at)) >= 5; ``` ```sql -- Task 2 skeleton: WITH ranked AS ( SELECT c.city, o.customer_id, COUNT(*) AS order_count, RANK() OVER (PARTITION BY c.city ORDER BY COUNT(*) DESC, o.customer_id ASC) AS rnk FROM orders o JOIN customers c USING (customer_id) WHERE DATE_TRUNC('month', o.placed_at) = DATE_TRUNC('month', CURRENT_DATE) GROUP BY c.city, o.customer_id ) SELECT city, customer_id, order_count FROM ranked WHERE rnk = 1; ``` ## Follow-ups 1. In Task 1, why use `COUNT(DISTINCT DATE(placed_at))` instead of `COUNT(*)`? 2. In Task 2, what is the difference between `RANK()` and `ROW_NUMBER()` for tie-breaking? 3. For Task 3, how would you find customers who ordered the same restaurant on consecutive days? 4. How would you optimize these queries on a 500M-row orders table?
## Round 1 - Data Science Case Study ## Problem DoorDash's order volume dropped 15% overnight in one metro area. You are the analyst on call. Walk through your full investigation. **Data Available:** - `orders(order_id, customer_id, restaurant_id, dasher_id, placed_at, status, order_total)` - `restaurants(restaurant_id, name, category, is_active, city)` - `app_events(event_id, user_id, event_type, platform, occurred_at)` **Step 1 - Scope the drop:** - Is the drop on all platforms or just iOS/Android/web? - Is it all restaurant categories or specific ones? - Is it all customer cohorts (new, returning) or one? **Step 2 - Check data integrity:** - Is the drop real or a logging failure? Check if `app_events` also dropped. **Step 3 - Hypothesize causes:** - Competitor promotion launched - App deployment introduced checkout bug - Major restaurant chain went offline - Extreme weather event **Step 4 - Validate and quantify:** ```sql SELECT restaurant_id, COUNT(*) AS orders FROM orders WHERE placed_at BETWEEN '2024-06-01' AND '2024-06-02' GROUP BY restaurant_id ORDER BY orders; ``` ## Follow-ups 1. How would you determine if the drop is seasonal vs. an anomaly? 2. What would you look at if order volume is fine but revenue dropped? 3. How do you communicate your findings to stakeholders before you have a root cause? 4. After identifying the cause, what process do you put in place to catch this faster next time?
## Round 1 - Data Science Case Study ## Problem Some DoorDash restaurants are accepting more orders than they can fulfill on time, leading to late deliveries and cancellations. Design a system to detect overwhelmed merchants and intervene. **Data Available:** - `orders(order_id, restaurant_id, placed_at, accepted_at, ready_at, delivered_at, status)` - `restaurants(restaurant_id, avg_prep_time_min, max_concurrent_orders)` **Part A - Detection Metric:** Define a real-time "overwhelm score" per restaurant. Consider: - Current active order count vs. `max_concurrent_orders` - Running average actual prep time vs. `avg_prep_time_min` in last 30 minutes - Cancellation rate in last hour **Part B - Intervention Decision:** At what threshold do you temporarily pause new orders to a restaurant? How do you avoid false positives that hurt restaurant revenue? **Part C - SQL:** Find restaurants where actual prep time exceeded estimated prep time by more than 50% in the last 2 hours, with at least 5 orders. ```sql SELECT restaurant_id, AVG(EXTRACT(EPOCH FROM (ready_at - placed_at))/60) AS avg_actual_prep, COUNT(*) AS order_count FROM orders WHERE placed_at >= NOW() - INTERVAL '2 hours' AND ready_at IS NOT NULL GROUP BY restaurant_id HAVING COUNT(*) >= 5; -- Join with restaurants to compare against avg_prep_time_min ``` ## Follow-ups 1. How do you distinguish a permanently slow restaurant from one having a one-off bad hour? 2. What notification do you send to the restaurant and what action can they take? 3. How would you A/B test whether pausing orders actually improves customer satisfaction? 4. What customer-facing message do you show when a restaurant is temporarily paused?
What DoorDash Looks for in Data Scientist Interviews
DoorDash Data Scientist interviews are calibrated against the level and scope expected of the role. Across 6+ verified candidate reports on LeakCode, the consistent signals interviewers look for: clear problem decomposition before coding, explicit complexity reasoning, structured handling of edge cases, and the ability to articulate trade-offs between two reasonable approaches.
The discriminator between candidates who advance and candidates who do not is rarely the final correctness of the solution. It is the path to the solution: did you ask clarifying questions, did you state your approach before coding, did you handle edge cases without prompting, and did you communicate your reasoning throughout. Reports tagged "no hire" frequently cite a working solution with poor communication; reports tagged "strong hire" cite clear thinking even when the final solution was incomplete.
How To Use This Question Set
Real interview reports are a calibration tool, not a memorization target. Companies update their question pools every 2-4 months; memorizing exact problems risks misleading you when the interviewer uses a variant. The high-leverage use: identify the patterns that appear repeatedly in DoorDash Data Scientist reports, practice those patterns on similar (not identical) problems, and use the reports to understand the interviewer's typical follow-up depth.
Filter the questions below by round type, difficulty, and recency. Focus first on reports from the past 6-12 months; older reports may reference questions that have since rotated out of DoorDash's pool. Reports tagged with quantified difficulty (e.g., "medium-hard") are higher-signal than reports without difficulty tags.
Round-by-Round Expectations
DoorDash Data Scientist loops typically span 4-6 rounds across phone screens and on-site or virtual on-site interviews. The structure varies by company: some run 1 recruiter screen + 1 technical phone + 3-4 on-site rounds; others run 1 recruiter screen + 1 OA + 4-5 on-site rounds. The recruiter screen is logistics and culture-light; the technical phone screen is medium-difficulty coding; the on-site loop covers coding, system design (at L4+ levels), and behavioral rounds.
Each round is designed to surface a specific signal. Coding rounds: correctness, code quality, complexity reasoning, communication. System design rounds: requirements clarification, design judgment, operational thinking. Behavioral rounds: ownership scope, leadership, ambiguity tolerance, conflict navigation. Strong candidates explicitly hit each signal dimension out loud during the round; weak candidates focus only on solving the prompt.
Common Interview Mistakes At This Combination
Reports tagged "no hire" at DoorDash Data Scientist commonly cite: jumping into code without clarifying requirements, coding silently for 10+ minutes without verbalizing approach, missing edge cases (empty input, single element, very large input, overflow), and producing a working solution that the candidate cannot explain or refactor when probed. Strong candidates avoid these patterns by following a consistent template: clarify, verbalize approach, code with narration, test with examples.
Behavioral and design rounds have their own failure modes. Behavioral: stories that use "we" instead of "I" diluting individual signal, stories with no quantified outcome, defensiveness when probed about failure. Design: not asking clarifying questions, not stating requirements out loud, designing for a single server when the prompt clearly implies scale, ignoring operational concerns (deployment, monitoring, rollback). These show up in roughly half of DoorDash Data Scientist interview retrospectives on LeakCode.
See All 6 DoorDash Data Scientist Questions
Full question text, answer context, and frequency data for subscribers.
Get Access