Word Break I & II ​
Problem Statement ​
Word Break I (LC 139) ​
Given a string s and a dictionary of words wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.
Example:
Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: "leetcode" = "leet" + "code"Word Break II (LC 140) ​
Given the same input, return all possible sentences formed by segmenting s into dictionary words.
Example:
Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
Output: ["cats and dog", "cat sand dog"]Constraints:
1 <= s.length <= 300(Word Break I),1 <= s.length <= 20(Word Break II)1 <= wordDict.length <= 10001 <= wordDict[i].length <= 20- All strings consist of lowercase English letters.
Difficulty: Medium (I), Hard (II)
Pattern Recognition ​
- What is the input structure? A string to partition, with a set of valid words.
- What are we optimizing? Part I: can the string be segmented? (decision problem). Part II: enumerate all valid segmentations.
- What pattern does this fit? Dynamic Programming (Part I) and Backtracking with Memoization (Part II).
- Why does this pattern fit? Whether
s[0..i]can be segmented depends on whether some prefixs[0..j]can be segmented ANDs[j..i]is a word. This is textbook overlapping subproblems.
Approach ​
Part I — Word Break ​
Brute Force — Recursion ​
Try every possible first word, recursively check the rest.
function wordBreak(s, wordDict):
if s is empty: return true
for word in wordDict:
if s starts with word:
if wordBreak(s[len(word):], wordDict): return true
return falseTime: O(2^n) — exponential branching. Space: O(n) recursion stack. Why insufficient: Repeated subproblems. "aaaa" with dict ["a", "aa"] explores the same suffixes many times.
Optimal — Bottom-Up DP ​
Key Insight: Define
dp[i]= true ifs[0..i-1](the first i characters) can be segmented. For each positioni, check all possible last words: ifdp[j]is true ands[j..i]is in the dictionary, thendp[i]is true.
Step-by-step:
- Create
dparray of sizen+1, all false. Setdp[0] = True(empty string). - Convert
wordDictto a set for O(1) lookup. - For each
ifrom 1 to n:- For each
jfrom 0 to i-1:- If
dp[j]is true ands[j:i]is in the word set, setdp[i] = Trueand break.
- If
- For each
- Return
dp[n].
Time: O(n^2 * m) where m is the average word length (for substring creation and hashing). Space: O(n).
Part II — Word Break II ​
Optimal — Backtracking with Memoization ​
Key Insight: First check if the string can be segmented (use Part I as a pruning step). Then use backtracking to enumerate all valid segmentations, memoizing the results for each suffix to avoid recomputation.
Step-by-step:
- Build a memoization cache:
memo[i]= list of all valid sentences fors[i:]. - For each position
i, try every word that matches starting ati. - Recursively get all sentences for the remainder, and prepend the current word.
- Cache and return.
Time: O(2^n * n) in the worst case (exponential number of valid segmentations). Space: O(2^n * n) for storing all sentences.
Walkthrough ​
Part I — s = "applepenapple", wordDict = ["apple", "pen"] ​
| i | j checked | s[j:i] | dp[j]? | In dict? | dp[i] |
|---|---|---|---|---|---|
| 1 | j=0 | "a" | dp[0]=T | No | F |
| 2 | j=0,1 | "ap","p" | — | No | F |
| 3 | j=0..2 | "app","pp","p" | — | No | F |
| 4 | j=0..3 | "appl","ppl","pl","l" | — | No | F |
| 5 | j=0 | "apple" | dp[0]=T | Yes | T |
| 6 | j=0..5 | — | No match | — | F |
| 7 | j=0..6 | — | No match | — | F |
| 8 | j=5 | "pen" | dp[5]=T | Yes | T |
| 9-12 | — | — | — | — | F |
| 13 | j=8 | "apple" | dp[8]=T | Yes | T |
dp[13] = True — "apple" + "pen" + "apple".
Part II — s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"] ​
Backtracking from index 0:
- "cat" matches → recurse on "sanddog"
- "sand" matches → recurse on "dog"
- "dog" matches → recurse on "" → base case → sentence: "cat sand dog"
- "sand" matches → recurse on "dog"
- "cats" matches → recurse on "anddog"
- "and" matches → recurse on "dog"
- "dog" matches → sentence: "cats and dog"
- "and" matches → recurse on "dog"
Output: ["cat sand dog", "cats and dog"]
Implementation ​
Part I — Word Break ​
function wordBreak(s: string, wordDict: string[]): boolean {
const wordSet = new Set(wordDict);
const n = s.length;
// dp[i] = true if s[0..i-1] can be segmented
const dp = new Array<boolean>(n + 1).fill(false);
dp[0] = true; // Empty string is trivially segmentable
for (let i = 1; i <= n; i++) {
for (let j = 0; j < i; j++) {
if (dp[j] && wordSet.has(s.slice(j, i))) {
dp[i] = true;
break; // No need to check more j values
}
}
}
return dp[n];
}bool wordBreak(string s, vector<string>& wordDict) {
unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
int n = s.size();
// dp[i] = true if s[0..i-1] can be segmented
vector<bool> dp(n + 1, false);
dp[0] = true; // Empty string is trivially segmentable
for (int i = 1; i <= n; i++) {
for (int j = 0; j < i; j++) {
if (dp[j] && wordSet.count(s.substr(j, i - j))) {
dp[i] = true;
break; // No need to check more j values
}
}
}
return dp[n];
}Optimized — limit j range by max word length:
function wordBreakOptimized(s: string, wordDict: string[]): boolean {
const wordSet = new Set(wordDict);
const maxWordLen = Math.max(...wordDict.map(w => w.length));
const n = s.length;
const dp = new Array<boolean>(n + 1).fill(false);
dp[0] = true;
for (let i = 1; i <= n; i++) {
// Only check j values where the word length is valid
for (let j = Math.max(0, i - maxWordLen); j < i; j++) {
if (dp[j] && wordSet.has(s.slice(j, i))) {
dp[i] = true;
break;
}
}
}
return dp[n];
}bool wordBreakOptimized(string s, vector<string>& wordDict) {
unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
int maxWordLen = 0;
for (auto& w : wordDict) maxWordLen = max(maxWordLen, (int)w.size());
int n = s.size();
vector<bool> dp(n + 1, false);
dp[0] = true;
for (int i = 1; i <= n; i++) {
// Only check j values where the word length is valid
for (int j = max(0, i - maxWordLen); j < i; j++) {
if (dp[j] && wordSet.count(s.substr(j, i - j))) {
dp[i] = true;
break;
}
}
}
return dp[n];
}Part II — Word Break II ​
function wordBreakII(s: string, wordDict: string[]): string[] {
const wordSet = new Set(wordDict);
// First check if segmentation is even possible (prune)
if (!wordBreak(s, wordDict)) return [];
const memo = new Map<number, string[]>();
function backtrack(start: number): string[] {
if (memo.has(start)) return memo.get(start)!;
if (start === s.length) return [""]; // Base case: empty tail
const sentences: string[] = [];
for (let end = start + 1; end <= s.length; end++) {
const word = s.slice(start, end);
if (wordSet.has(word)) {
const restSentences = backtrack(end);
for (const rest of restSentences) {
sentences.push(rest ? word + " " + rest : word);
}
}
}
memo.set(start, sentences);
return sentences;
}
return backtrack(0);
}class Solution {
unordered_set<string> wordSet;
unordered_map<int, vector<string>> memo;
vector<string> backtrack(const string& s, int start) {
if (memo.count(start)) return memo[start];
if (start == (int)s.size()) return {""}; // Base case: empty tail
vector<string> sentences;
for (int end = start + 1; end <= (int)s.size(); end++) {
string word = s.substr(start, end - start);
if (wordSet.count(word)) {
auto restSentences = backtrack(s, end);
for (auto& rest : restSentences) {
sentences.push_back(rest.empty() ? word : word + " " + rest);
}
}
}
memo[start] = sentences;
return sentences;
}
public:
vector<string> wordBreakII(string s, vector<string>& wordDict) {
wordSet = unordered_set<string>(wordDict.begin(), wordDict.end());
memo.clear();
// First check if segmentation is even possible (prune)
// using the DP approach from Part I
int n = s.size();
vector<bool> dp(n + 1, false);
dp[0] = true;
for (int i = 1; i <= n; i++) {
for (int j = 0; j < i; j++) {
if (dp[j] && wordSet.count(s.substr(j, i - j))) {
dp[i] = true;
break;
}
}
}
if (!dp[n]) return {};
return backtrack(s, 0);
}
};Watch out:
- Part I: not breaking early. Once
dp[i]is set to True, stop checking morejvalues. Forgetting thebreakdoes not affect correctness but degrades performance. - Part II: not pruning with Part I check. For inputs like
s = "aaaa...a"(n=300) withwordDict = ["a", "aa", ..., "aaa...a"], there are exponentially many segmentations. The Part I check can detect impossible cases in O(n^2), avoiding wasted backtracking. - Part II: string concatenation in recursion. Building sentences by prepending/appending strings in each recursive call can be inefficient. Using a list of words and joining at the end is more efficient.
Complexity Analysis ​
| Time | Space | |
|---|---|---|
| Part I — Brute Force | O(2^n) | O(n) |
| Part I — DP | O(n^2 * m) | O(n) |
| Part II — Backtracking + Memo | O(2^n * n) worst case | O(2^n * n) |
Bottleneck: Part I is bounded by O(n^2) iterations with O(m) substring hashing, giving O(n^2 * m). Part II can have exponentially many valid segmentations (e.g., "aaa...a" with dict ["a", "aa"]), so the output itself is exponential. The memoization avoids recomputing subproblems but cannot reduce the output size.
Edge Cases ​
- Single character string — Check if it is in the dictionary.
- Word dictionary contains the entire string — dp[n] is set directly from dp[0].
- Empty dictionary — No word can match. Return false / empty list.
- String with repeated characters, dict with various lengths — e.g., s = "aaa", dict = ["a", "aa"]. Part I returns true. Part II returns ["a a a", "aa a", "a aa"].
- No valid segmentation — Part I returns false. Part II should return []. The pruning step prevents wasted computation.
- Very long string for Part I (n=300) — DP handles this. The O(n^2) loop with up to 300 iterations is fast.
Related Problems ​
- LC 472 — Concatenated Words — Find all words that can be built from other words in the list. Uses Word Break I as a subroutine.
- LC 2707 — Extra Characters in a String — Minimize leftover characters after segmenting. Same DP structure, but minimizing instead of checking feasibility.
- LC 1048 — Longest String Chain — DP on words with predecessor relationship.
- LC 91 — Decode Ways — Similar DP: can the string be decoded? Each position depends on whether a prefix is valid.
What is Expected at Each Level ​
Junior: Solve Part I with the brute-force recursive approach. Recognize overlapping subproblems and convert to DP with guidance. May not attempt Part II.
Mid-level: Write the DP solution for Part I cleanly. Optimize with max word length. For Part II, implement backtracking with memoization. Discuss the exponential nature of Part II output and why memoization helps.
Senior: Solve both parts quickly. Discuss Trie-based optimization for word lookup (avoids repeated substring creation). Explain why Part II is inherently exponential and connect to NP-hard problems. Discuss BFS approach for Part I as an alternative. Mention practical applications: NLP word segmentation, autocomplete.