Applied Intuition Software Engineer Interview Questions
17+ questions from real Applied Intuition Software Engineer interviews, reported by candidates.
Round Types
Top Topics
Questions
## Problem You are asked to design a system that ingests 10 TB of application logs per day, processes them for anomaly detection and aggregation, and serves dashboards with sub-second query latency. Walk through the architecture. ``` Requirements: - Ingest: 10 TB/day, 100k events/sec peak - Processing: real-time aggregation + batch anomaly detection - Query: dashboard queries over last 7 days, p99 < 500ms - Retention: 90 days hot, 3 years cold - Fault tolerance: no data loss ``` **Proposed architecture:** ``` Producers -> Kafka (partitioned by service) -+-> Flink (real-time agg) | -> Redis (hot counters) +-> S3 (raw parquet, partitioned by date) -> Spark (batch anomaly, daily) -> Trino/Presto (ad-hoc SQL) Dashboard -> Druid (pre-agg OLAP) ``` ## Follow-ups 1. Why partition Kafka by service rather than by timestamp? What are the tradeoffs? 2. How do you handle late-arriving events in the Flink streaming layer? 3. What compaction and partitioning strategy on S3 enables fast Trino queries? 4. How would you implement exactly-once semantics end-to-end from Kafka to the database?
## Problem You manage a campground with `n` campsites. Reservations are stored as `(site_id, start_day, end_day)` (inclusive). Given a requested stay `(start, end)`, return the list of available site IDs sorted in ascending order. ```python def available_campsites( n: int, reservations: list[tuple[int, int, int]], # (site_id, start, end) request_start: int, request_end: int ) -> list[int]: pass ``` ``` Input: n = 4 reservations = [(1,1,5),(1,8,10),(2,3,7),(3,1,3)] request_start = 4, request_end = 6 Output: [1, 4] # Site 1: occupied days 1-5 overlaps [4,6] -> not available # Site 2: occupied days 3-7 overlaps [4,6] -> not available # Site 3: occupied days 1-3, no overlap with [4,6] -> available # Site 4: no reservations -> available # Sorted: [3, 4] Corrected Output: [3, 4] ``` ## Follow-ups 1. What is the interval overlap condition? Prove your predicate handles edge cases (adjacent days). 2. If `n` and reservations are large, what index structure speeds up availability queries? 3. Extend to return the site with the fewest total reserved days among available sites. 4. How would you write this as a SQL query using NOT EXISTS or a LEFT JOIN approach?
## Problem Encode a string using run-length encoding or a custom compression scheme. ## Likely LeetCode equivalent Similar to LC 443 String Compression. ## Tags coding, strings, encoding, phone
## Problem Parse and evaluate a mathematical formula string, handling operator precedence and parentheses. ## Likely LeetCode equivalent Similar to LC 224 Basic Calculator. ## Tags coding, stack, parsing, math, phone
## Problem You are given a list of 2D integer coordinates. Group them into clusters such that all points within a cluster are within Euclidean distance `r` of at least one other point in the same cluster. Return each cluster as a sorted list of point indices, with clusters sorted by their smallest index. ```python def group_coordinates( points: list[tuple[int, int]], r: float ) -> list[list[int]]: pass ``` ``` Input: points = [(0,0),(1,1),(10,10),(11,11),(0,1)] r = 2.0 Output: [[0,1,4], [2,3]] # (0,0),(1,1),(0,1) are all within r=2 of each other -> cluster # (10,10),(11,11) -> cluster Input: points = [(0,0),(100,100)] r = 1.0 Output: [[0],[1]] ``` ## Follow-ups 1. This is essentially finding connected components in a graph where edges connect points within distance `r`. What graph traversal do you use? 2. The naive approach is O(n^2) for edge construction. How does a k-d tree reduce this? 3. How does this compare to DBSCAN clustering? What is the role of `r` vs DBSCAN's epsilon? 4. Extend to 3D coordinates — what changes in your approach?
## Problem Design a `JobMonitor` that tracks the lifecycle of asynchronous jobs. Jobs can be registered, updated with status, and queried. Alert (log to a provided callback) when a job fails or has not completed within a timeout period. ```python from typing import Callable import time class JobMonitor: def __init__(self, alert_callback: Callable[[str, str], None]): """alert_callback(job_id, reason) is called on failure/timeout.""" pass def register(self, job_id: str, timeout_sec: int) -> None: """Register a new job with a timeout.""" pass def update_status(self, job_id: str, status: str) -> None: """status: 'running', 'completed', 'failed'""" pass def check_timeouts(self, current_time: float | None = None) -> None: """Call periodically; alerts on any job past its timeout.""" pass def get_status(self, job_id: str) -> dict: pass ``` ``` monitor = JobMonitor(alert_callback=lambda jid, reason: print(f"{jid}: {reason}")) monitor.register("job1", timeout_sec=30) monitor.update_status("job1", "running") # ... 31 seconds later ... monitor.check_timeouts() # -> prints "job1: timeout" ``` ## Follow-ups 1. How do you efficiently find all timed-out jobs without scanning all registered jobs every check? 2. How would you persist job state so the monitor survives process restarts? 3. Extend to support job dependencies: job B should not start until job A completes. 4. How would you expose this as a REST API with endpoints for registration, status updates, and querying?
## Problem Implement a key-value store with get, set, and optionally delete operations. ## Likely LeetCode equivalent Similar to LC 146 LRU Cache design. ## Tags coding, hash_table, design, onsite
## Problem You are given a sequence of 2D waypoints that define a road lane as a polyline. Given a query point `P`, determine which segment of the polyline it belongs to (i.e., which consecutive pair of waypoints `(W[i], W[i+1])` is closest to `P`), and return the index `i`. ```python def find_lane_segment(waypoints: list[tuple[float, float]], point: tuple[float, float]) -> int: # waypoints: [(x0,y0), (x1,y1), ...], len >= 2 # Returns index i such that segment (waypoints[i], waypoints[i+1]) is closest to point pass ``` **Example:** ``` waypoints = [(0,0), (10,0), (10,10), (20,10)] point = (11, 3) -> 1 # closest to segment (10,0)-(10,10) point = (5, 2) -> 0 # closest to segment (0,0)-(10,0) ``` ## Follow-ups 1. How do you compute the perpendicular (orthogonal) distance from a point to a finite line segment vs. an infinite line? 2. What if multiple segments are equidistant? How do you break ties? 3. How would you extend this to 3D waypoints for autonomous vehicle path tracking? 4. If you need to handle thousands of query points per second, what spatial index structure would you use to speed up segment lookup?
## Problem Rotate the layers of a 2D matrix by a specified number of positions. ## Likely LeetCode equivalent Similar to LC 48 Rotate Image. ## Tags coding, matrix, simulation, phone
## Problem You are given a raw message string from a custom text protocol. Each message consists of fields separated by `|`, where each field is a `key=value` pair. Some values may themselves contain escaped pipes `\|`. Parse the message and return a dictionary of field names to values, with escape sequences resolved. ```python def parse_message(raw: str) -> dict[str, str]: # Returns {field_name: value, ...} pass ``` **Example:** ``` raw = "type=ORDER|id=42|note=buy\|sell|qty=10" -> {"type": "ORDER", "id": "42", "note": "buy|sell", "qty": "10"} raw = "status=OK|msg=all good" -> {"status": "OK", "msg": "all good"} raw = "a=1||b=2" # empty field between double-pipe -> {"a": "1", "": "", "b": "2"} # or raise, based on spec ``` ## Follow-ups 1. How would you handle duplicate keys? Last-write-wins, first-write-wins, or collect as list? 2. What changes if values can also contain escaped equals signs `\=`? 3. How would you validate that required fields (e.g., `type`, `id`) are always present and raise a descriptive error otherwise? 4. Extend the parser to support nested messages, where a value is itself a `key=value|...` block wrapped in `{}`.
## Problem Two players alternate turns on an N x M grid. Each cell is either empty (`.`) or a mine (`*`). A player places their token on the grid and must move it exactly one step (up, down, left, right) on their turn. A player loses if they step on a mine or have no valid moves. Both play optimally. Given the starting position and whose turn it is, determine whether the first player wins or loses. ```python def mine_game(grid: list[str], start_r: int, start_c: int) -> str: # Returns "WIN" if first player wins, "LOSE" otherwise pass ``` **Example:** ``` grid = [ "...", ".*.", "..." ] start_r, start_c = 0, 0 -> "WIN" grid = [".*", ".."], start_r=0, start_c=0 -> "WIN" # first player can avoid the mine ``` ## Follow-ups 1. What game-theory concept underpins this problem, and what is the time complexity of the minimax approach? 2. How would you use memoization to avoid recomputing states you have already visited? 3. What if a player can move 1 OR 2 steps in a single turn? How does the state space change? 4. Can you detect cycles (infinite games) and handle them as a draw rather than a win/loss?
## Problem You have a table of geographic points. Write a SQL query to return all points within a given radius (in kilometers) of a target latitude/longitude, ordered by distance ascending. ```sql -- Table: points -- id INT, name VARCHAR, lat FLOAT, lon FLOAT -- Use the Haversine approximation formula: -- distance_km = 6371 * acos( -- cos(radians(:target_lat)) * cos(radians(lat)) -- * cos(radians(lon) - radians(:target_lon)) -- + sin(radians(:target_lat)) * sin(radians(lat)) -- ) ``` **Example query parameters:** `target_lat=37.7749, target_lon=-122.4194, radius_km=10` **Expected columns:** `id, name, lat, lon, distance_km` ```sql SELECT id, name, lat, lon, 6371 * acos(...) AS distance_km FROM points HAVING distance_km <= :radius_km ORDER BY distance_km ASC; ``` ## Follow-ups 1. Why must you use `HAVING` instead of `WHERE` when filtering on the computed `distance_km` alias? 2. At what scale would this full-table scan become a bottleneck, and what index type (e.g., spatial index, geohash) would you add? 3. How does the Haversine formula break down near the poles, and what formula is more accurate? 4. Rewrite this query to also return a bounding box check first as a pre-filter before the expensive `acos` computation.
## Problem Given an array of 2D points and an integer `k`, return the `k` points closest to the origin `(0, 0)`. Distance is Euclidean. The result can be in any order. ```python def k_closest(points: list[list[int]], k: int) -> list[list[int]]: pass ``` **Example:** ``` points = [[1,3],[-2,2],[5,8],[0,1]], k = 2 -> [[-2,2],[0,1]] # distances: sqrt(10), sqrt(8), sqrt(89), 1 points = [[3,3],[5,-1],[-2,4]], k = 2 -> [[3,3],[-2,4]] # distances: sqrt(18), sqrt(26), sqrt(20) ``` ## Round 1 - Coding Implement a solution using a max-heap of size `k`. ## Follow-ups 1. What is the time complexity of the heap approach vs. a full sort? When does each become preferable? 2. How would you use the Quickselect algorithm to achieve average O(n) time? 3. If points arrive as a stream and you must always return the current `k` closest, how do you maintain the heap efficiently? 4. How does the solution change if distance is Manhattan distance instead of Euclidean?
## Problem You are implementing a software UART (Universal Asynchronous Receiver-Transmitter) receiver in C. Given a stream of raw bit samples (sampled at 8x the baud rate), decode the data bytes. A UART frame consists of: 1 start bit (0), 8 data bits (LSB first), 1 stop bit (1). Framing errors should be flagged. ```c typedef struct { uint8_t data; bool framing_error; } UartFrame; int uart_decode(const uint8_t* samples, int n_samples, UartFrame* out_frames, int max_frames); // Returns number of frames decoded ``` **Example:** ``` // Baud rate = 9600, sample rate = 76800 (8x oversampling) // Bit 0 sampled at positions 4,12,20,28... (center of each bit window) // Frame for byte 0xA5 (10100101): // START=0, bits=1,0,1,0,0,1,0,1, STOP=1 // -> {data=0xA5, framing_error=false} ``` ## Follow-ups 1. Why do you sample at the center of the bit window rather than the edge, and how does oversampling reduce bit-error rate? 2. How do you detect and recover from framing errors (missing stop bit) without losing subsequent frames? 3. What changes if you add parity support (even/odd parity bits)? 4. How would you implement the transmit (TX) side to produce the same bit stream at a target baud rate?
## Problem Simulate vehicle collision detection based on positions and velocities along a road. ## Likely LeetCode equivalent Similar to LC 2211 Count Collisions on a Road. ## Tags coding, arrays, simulation, phone
## Problem You are given a list of GPS readings for a vehicle: each reading has a timestamp (in seconds) and a position `(x, y)` in meters. Compute the instantaneous velocity (m/s) at each interior point using central differences, and return the average speed over the full trace. ```python from dataclasses import dataclass @dataclass class GPSReading: t: float # seconds x: float y: float def compute_velocities(readings: list[GPSReading]) -> tuple[list[float], float]: # Returns (instantaneous_speeds, average_speed) # instantaneous_speeds[0] and [-1] should use forward/backward differences pass ``` **Example:** ``` readings = [ GPSReading(0, 0, 0), GPSReading(1, 3, 4), # distance from prev = 5m GPSReading(3, 9, 12), # distance from prev = 10m over 2s ] -> speeds = [5.0, 5.0, 5.0], avg = 5.0 ``` ## Follow-ups 1. What is the difference between central, forward, and backward finite-difference approximations for velocity? 2. GPS readings often have noise. What smoothing technique would you apply before computing velocities? 3. How would you compute acceleration from the velocity array? 4. If timestamps are not uniformly spaced, how does that affect the finite-difference formula?
## Problem You are given a list of 3D vertices and a face list (triangles as index triples). Two vertices are considered duplicates if their Euclidean distance is below a threshold `eps`. Merge all duplicates into a single representative vertex, update the face list accordingly, and return the compressed mesh. ```python def compress_vertices( vertices: list[tuple[float,float,float]], faces: list[tuple[int,int,int]], eps: float ) -> tuple[list[tuple[float,float,float]], list[tuple[int,int,int]]]: # Returns (new_vertices, new_faces) pass ``` **Example:** ``` vertices = [(0,0,0),(0,0,0.0001),(1,0,0),(1,0,0)] faces = [(0,1,2),(1,2,3)] eps = 0.001 -> new_vertices = [(0,0,0), (1,0,0)] new_faces = [(0,0,1), (0,1,1)] ``` ## Follow-ups 1. What data structure would you use to make the O(n^2) pairwise comparison more efficient (e.g., k-d tree, spatial hash grid)? 2. After merging vertices, degenerate faces (all three indices the same) may appear. How do you detect and remove them? 3. How would you handle vertex attributes (normals, UVs) when merging -- average, take first, or discard? 4. What is weld tolerance and why is it important in game engine asset pipelines?
What Applied Intuition Looks for in Software Engineer Interviews
Applied Intuition Software Engineer interviews are calibrated against the level and scope expected of the role. Across 17+ verified candidate reports on LeakCode, the consistent signals interviewers look for: clear problem decomposition before coding, explicit complexity reasoning, structured handling of edge cases, and the ability to articulate trade-offs between two reasonable approaches.
The discriminator between candidates who advance and candidates who do not is rarely the final correctness of the solution. It is the path to the solution: did you ask clarifying questions, did you state your approach before coding, did you handle edge cases without prompting, and did you communicate your reasoning throughout. Reports tagged "no hire" frequently cite a working solution with poor communication; reports tagged "strong hire" cite clear thinking even when the final solution was incomplete.
How To Use This Question Set
Real interview reports are a calibration tool, not a memorization target. Companies update their question pools every 2-4 months; memorizing exact problems risks misleading you when the interviewer uses a variant. The high-leverage use: identify the patterns that appear repeatedly in Applied Intuition Software Engineer reports, practice those patterns on similar (not identical) problems, and use the reports to understand the interviewer's typical follow-up depth.
Filter the questions below by round type, difficulty, and recency. Focus first on reports from the past 6-12 months; older reports may reference questions that have since rotated out of Applied Intuition's pool. Reports tagged with quantified difficulty (e.g., "medium-hard") are higher-signal than reports without difficulty tags.
Round-by-Round Expectations
Applied Intuition Software Engineer loops typically span 4-6 rounds across phone screens and on-site or virtual on-site interviews. The structure varies by company: some run 1 recruiter screen + 1 technical phone + 3-4 on-site rounds; others run 1 recruiter screen + 1 OA + 4-5 on-site rounds. The recruiter screen is logistics and culture-light; the technical phone screen is medium-difficulty coding; the on-site loop covers coding, system design (at L4+ levels), and behavioral rounds.
Each round is designed to surface a specific signal. Coding rounds: correctness, code quality, complexity reasoning, communication. System design rounds: requirements clarification, design judgment, operational thinking. Behavioral rounds: ownership scope, leadership, ambiguity tolerance, conflict navigation. Strong candidates explicitly hit each signal dimension out loud during the round; weak candidates focus only on solving the prompt.
Common Interview Mistakes At This Combination
Reports tagged "no hire" at Applied Intuition Software Engineer commonly cite: jumping into code without clarifying requirements, coding silently for 10+ minutes without verbalizing approach, missing edge cases (empty input, single element, very large input, overflow), and producing a working solution that the candidate cannot explain or refactor when probed. Strong candidates avoid these patterns by following a consistent template: clarify, verbalize approach, code with narration, test with examples.
Behavioral and design rounds have their own failure modes. Behavioral: stories that use "we" instead of "I" diluting individual signal, stories with no quantified outcome, defensiveness when probed about failure. Design: not asking clarifying questions, not stating requirements out loud, designing for a single server when the prompt clearly implies scale, ignoring operational concerns (deployment, monitoring, rollback). These show up in roughly half of Applied Intuition Software Engineer interview retrospectives on LeakCode.
See All 17 Applied Intuition Software Engineer Questions
Full question text, answer context, and frequency data for subscribers.
Get Access