Frequently Asked at Uber — This problem has been reported multiple times in recent Uber L5A interviews.
Alien Dictionary ​
Problem Statement ​
You are given a sorted list of words from an alien language that uses lowercase English letters. The ordering of the alphabet is unknown to you, but it is consistent with the lexicographic order of the given words. Derive any valid ordering of the alien alphabet. If no valid ordering exists (a cycle is implied, or the input is invalid), return the empty string.
Example 1:
Input: words = ["wrt", "wrf", "er", "ett", "rftt"]
Output: "wertf"
Explanation: Comparing adjacent words gives edges t -> f, w -> e, r -> t, e -> r.Example 2:
Input: words = ["abc", "ab"]
Output: ""
Explanation: A longer word cannot come before its own prefix — invalid input.Constraints:
1 <= words.length <= 1001 <= words[i].length <= 100- All characters are lowercase English letters.
Difficulty: Hard
Pattern Recognition ​
- What is the structure of the input? A list of strings whose ordering encodes pairwise character precedence.
- What are we searching for? A linear ordering of the characters consistent with all pairwise constraints.
- What pattern does this suggest? Topological sort on a directed graph where nodes are characters and edges are "a before b" constraints.
- Why does this pattern fit? Each adjacent word pair yields at most one precedence edge (from the first differing character). The final ordering must be a topological order of that graph. If a cycle exists, no valid order exists.
Approach ​
Brute Force — Permutations ​
Try every permutation of the characters that appear in the input and check whether the given list is sorted under that permutation.
Time: O(26! * total_chars) — permutations dominate. Space: O(26). Why insufficient: 26! is astronomical. Even with only 10 distinct characters, this explodes.
Optimal — Topological Sort ​
Key Insight: You only learn one precedence fact from each adjacent word pair — the position of the first differing character. Collect those facts into a graph, then run a topological sort.
Step-by-step:
- Collect every character that appears anywhere — these are your nodes.
- For each adjacent pair
(w1, w2):- If
w1is longer thanw2andw2is a prefix ofw1(likeabcbeforeab), the input is invalid — return"". - Otherwise scan until you find the first differing character at index
j. Add edgew1[j] -> w2[j]and increment the in-degree ofw2[j]. Stop scanning (no further info from this pair).
- If
- Kahn's BFS: push every zero-in-degree node. Pop one at a time, append to result, decrement in-degree of its neighbors, push any that reach zero.
- If the result contains every node, return it. Otherwise a cycle existed — return
"".
Time: O(C) where C is the total number of characters across all words. Space: O(1) for the graph (at most 26 nodes and 26^2 edges), O(C) for the initial character scan.
Walkthrough ​
Using words = ["wrt", "wrf", "er", "ett", "rftt"].
Collect nodes: {w, r, t, f, e}, all with initial in-degree 0.
Compare adjacent pairs:
| Pair | First diff | Edge added | In-degrees so far |
|---|---|---|---|
| wrt, wrf | index 2: t vs f | t -> f | f:1 |
| wrf, er | index 0: w vs e | w -> e | e:1, f:1 |
| er, ett | index 1: r vs t | r -> t | e:1, f:1, t:1 |
| ett, rftt | index 0: e vs r | e -> r | e:1, f:1, t:1, r:1 |
Kahn's BFS: Only w has in-degree 0. Start queue with [w].
| Step | Pop | Result | Decrement | Queue |
|---|---|---|---|---|
| 1 | w | "w" | e -> 0 | [e] |
| 2 | e | "we" | r -> 0 | [r] |
| 3 | r | "wer" | t -> 0 | [t] |
| 4 | t | "wert" | f -> 0 | [f] |
| 5 | f | "wertf" | — | [] |
Result size equals number of nodes — return "wertf".
Implementation ​
function alienOrder(words: string[]): string {
const graph = new Map<string, Set<string>>();
const inDegree = new Map<string, number>();
for (const word of words) {
for (const ch of word) {
if (!graph.has(ch)) {
graph.set(ch, new Set());
inDegree.set(ch, 0);
}
}
}
for (let i = 0; i < words.length - 1; i++) {
const w1 = words[i];
const w2 = words[i + 1];
if (w1.length > w2.length && w1.startsWith(w2)) return "";
for (let j = 0; j < Math.min(w1.length, w2.length); j++) {
const a = w1[j];
const b = w2[j];
if (a !== b) {
if (!graph.get(a)!.has(b)) {
graph.get(a)!.add(b);
inDegree.set(b, inDegree.get(b)! + 1);
}
break;
}
}
}
const queue: string[] = [];
for (const [ch, deg] of inDegree) {
if (deg === 0) queue.push(ch);
}
let result = "";
while (queue.length > 0) {
const ch = queue.shift()!;
result += ch;
for (const next of graph.get(ch)!) {
inDegree.set(next, inDegree.get(next)! - 1);
if (inDegree.get(next) === 0) queue.push(next);
}
}
return result.length === inDegree.size ? result : "";
}class Solution {
public:
string alienOrder(vector<string>& words) {
unordered_map<char, unordered_set<char>> graph;
unordered_map<char, int> inDegree;
for (const auto& w : words) {
for (char c : w) {
if (!inDegree.count(c)) inDegree[c] = 0;
}
}
for (int i = 0; i + 1 < (int)words.size(); ++i) {
const string& a = words[i];
const string& b = words[i + 1];
if (a.size() > b.size() && a.substr(0, b.size()) == b) return "";
for (int j = 0; j < (int)min(a.size(), b.size()); ++j) {
if (a[j] != b[j]) {
if (!graph[a[j]].count(b[j])) {
graph[a[j]].insert(b[j]);
inDegree[b[j]]++;
}
break;
}
}
}
queue<char> q;
for (auto& [c, d] : inDegree) if (d == 0) q.push(c);
string result;
while (!q.empty()) {
char c = q.front(); q.pop();
result += c;
for (char nxt : graph[c]) {
if (--inDegree[nxt] == 0) q.push(nxt);
}
}
return result.size() == inDegree.size() ? result : "";
}
};Watch out: Forgetting to add characters that never appear in a differing pair. If
"z"only shows up inside one word, it still needs to be in the output — otherwise the final length check misfires.
Watch out: The prefix check
["abc", "ab"]. If you skip it you silently miss an invalid input and return a bogus ordering.
Watch out: Adding duplicate edges. Use a
Setper source node so the in-degree count stays correct.
Complexity Analysis ​
| Time | Space | |
|---|---|---|
| Brute Force (permutations) | O(26! * C) | O(26) |
| Optimal (topological sort) | O(C) | O(1) for graph, O(C) for scan |
Bottleneck: You must touch every character at least once to know which nodes exist. Beyond that, the graph has at most 26 nodes and 26*25 edges, so topological sort is effectively constant work.
Edge Cases ​
- Single word — No pairs to compare. Return the word's unique characters in any order.
- Prefix violation
["abc", "ab"]— Return""immediately. - Cycle
["a", "b", "a"]— Topological sort cannot consume all nodes. Return"". - Duplicate words — First differing character never fires; no edge added. That is correct.
- Characters that appear in only one word — Must still show up in the output. Your initial node-collection loop handles this.
- All identical words — No edges at all. Output is any permutation of the characters present.
Related Problems ​
- LC 207 — Course Schedule: Same topological sort machinery, detect cycles on an explicit edge list.
- LC 210 — Course Schedule II: Return the full topological order.
- LC 444 — Sequence Reconstruction: Check whether the topological order is unique.
- LC 1203 — Sort Items by Groups Respecting Dependencies: Nested topological sort across two graphs.
What Is Expected at Each Level? ​
Junior / New Grad: Recognize that this is a graph problem from the precedence constraints. Get to building the character set and a naive comparison of adjacent words. Might need a nudge to see that only the first differing character matters.
Mid-level: Identify topological sort on sight. Implement Kahn's BFS cleanly, handle the prefix-violation edge case, and explain why the in-degree check correctly detects cycles.
Senior: Write both Kahn and DFS cycle-detection variants. Discuss how to detect uniqueness of the ordering, how duplicate edges inflate in-degree if you are not careful, and how to generalize to arbitrary alphabets. Propose follow-ups like "return all valid orderings" and discuss why that is exponential.