Word Break I & II

Problem Statement

Word Break I (LC 139)

Given a string s and a dictionary of words wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Example:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: "leetcode" = "leet" + "code"

Word Break II (LC 140)

Given the same input, return all possible sentences formed by segmenting s into dictionary words.

Example:

Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]
Output: ["cats and dog", "cat sand dog"]

Constraints:

1 <= s.length <= 300 (Word Break I), 1 <= s.length <= 20 (Word Break II)
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
All strings consist of lowercase English letters.

Difficulty: Medium (I), Hard (II)

Pattern Recognition

What is the input structure? A string to partition, with a set of valid words.
What are we optimizing? Part I: can the string be segmented? (decision problem). Part II: enumerate all valid segmentations.
What pattern does this fit? Dynamic Programming (Part I) and Backtracking with Memoization (Part II).
Why does this pattern fit? Whether s[0..i] can be segmented depends on whether some prefix s[0..j] can be segmented AND s[j..i] is a word. This is textbook overlapping subproblems.

Approach

Part I — Word Break

Brute Force — Recursion

Try every possible first word, recursively check the rest.

function wordBreak(s, wordDict):
    if s is empty: return true
    for word in wordDict:
        if s starts with word:
            if wordBreak(s[len(word):], wordDict): return true
    return false

Time: O(2^n) — exponential branching. Space: O(n) recursion stack. Why insufficient: Repeated subproblems. "aaaa" with dict ["a", "aa"] explores the same suffixes many times.

Optimal — Bottom-Up DP

Key Insight: Define dp[i] = true if s[0..i-1] (the first i characters) can be segmented. For each position i, check all possible last words: if dp[j] is true and s[j..i] is in the dictionary, then dp[i] is true.

Step-by-step:

Create dp array of size n+1, all false. Set dp[0] = True (empty string).
Convert wordDict to a set for O(1) lookup.
For each i from 1 to n:
- For each j from 0 to i-1:
  - If dp[j] is true and s[j:i] is in the word set, set dp[i] = True and break.
Return dp[n].

Time: O(n^2 * m) where m is the average word length (for substring creation and hashing). Space: O(n).

Part II — Word Break II

Optimal — Backtracking with Memoization

Key Insight: First check if the string can be segmented (use Part I as a pruning step). Then use backtracking to enumerate all valid segmentations, memoizing the results for each suffix to avoid recomputation.

Step-by-step:

Build a memoization cache: memo[i] = list of all valid sentences for s[i:].
For each position i, try every word that matches starting at i.
Recursively get all sentences for the remainder, and prepend the current word.
Cache and return.

Time: O(2^n * n) in the worst case (exponential number of valid segmentations). Space: O(2^n * n) for storing all sentences.

Walkthrough

Part I — `s = "applepenapple"`, `wordDict = ["apple", "pen"]`

i	j checked	s[j:i]	dp[j]?	In dict?	dp[i]
1	j=0	"a"	dp[0]=T	No	F
2	j=0,1	"ap","p"	—	No	F
3	j=0..2	"app","pp","p"	—	No	F
4	j=0..3	"appl","ppl","pl","l"	—	No	F
5	j=0	"apple"	dp[0]=T	Yes	T
6	j=0..5	—	No match	—	F
7	j=0..6	—	No match	—	F
8	j=5	"pen"	dp[5]=T	Yes	T
9-12	—	—	—	—	F
13	j=8	"apple"	dp[8]=T	Yes	T

dp[13] = True — "apple" + "pen" + "apple".

Part II — `s = "catsanddog"`, `wordDict = ["cat","cats","and","sand","dog"]`

Backtracking from index 0:

"cat" matches → recurse on "sanddog"
- "sand" matches → recurse on "dog"
  - "dog" matches → recurse on "" → base case → sentence: "cat sand dog"
"cats" matches → recurse on "anddog"
- "and" matches → recurse on "dog"
  - "dog" matches → sentence: "cats and dog"

Output: ["cat sand dog", "cats and dog"]

Implementation

Part I — Word Break

TypeScriptC++

typescript

function wordBreak(s: string, wordDict: string[]): boolean {
    const wordSet = new Set(wordDict);
    const n = s.length;

    // dp[i] = true if s[0..i-1] can be segmented
    const dp = new Array<boolean>(n + 1).fill(false);
    dp[0] = true; // Empty string is trivially segmentable

    for (let i = 1; i <= n; i++) {
        for (let j = 0; j < i; j++) {
            if (dp[j] && wordSet.has(s.slice(j, i))) {
                dp[i] = true;
                break; // No need to check more j values
            }
        }
    }

    return dp[n];
}

cpp

bool wordBreak(string s, vector<string>& wordDict) {
    unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
    int n = s.size();

    // dp[i] = true if s[0..i-1] can be segmented
    vector<bool> dp(n + 1, false);
    dp[0] = true; // Empty string is trivially segmentable

    for (int i = 1; i <= n; i++) {
        for (int j = 0; j < i; j++) {
            if (dp[j] && wordSet.count(s.substr(j, i - j))) {
                dp[i] = true;
                break; // No need to check more j values
            }
        }
    }

    return dp[n];
}

Optimized — limit j range by max word length:

TypeScriptC++

typescript

function wordBreakOptimized(s: string, wordDict: string[]): boolean {
    const wordSet = new Set(wordDict);
    const maxWordLen = Math.max(...wordDict.map(w => w.length));
    const n = s.length;

    const dp = new Array<boolean>(n + 1).fill(false);
    dp[0] = true;

    for (let i = 1; i <= n; i++) {
        // Only check j values where the word length is valid
        for (let j = Math.max(0, i - maxWordLen); j < i; j++) {
            if (dp[j] && wordSet.has(s.slice(j, i))) {
                dp[i] = true;
                break;
            }
        }
    }

    return dp[n];
}

cpp

bool wordBreakOptimized(string s, vector<string>& wordDict) {
    unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
    int maxWordLen = 0;
    for (auto& w : wordDict) maxWordLen = max(maxWordLen, (int)w.size());
    int n = s.size();

    vector<bool> dp(n + 1, false);
    dp[0] = true;

    for (int i = 1; i <= n; i++) {
        // Only check j values where the word length is valid
        for (int j = max(0, i - maxWordLen); j < i; j++) {
            if (dp[j] && wordSet.count(s.substr(j, i - j))) {
                dp[i] = true;
                break;
            }
        }
    }

    return dp[n];
}

Part II — Word Break II

TypeScriptC++

typescript

function wordBreakII(s: string, wordDict: string[]): string[] {
    const wordSet = new Set(wordDict);

    // First check if segmentation is even possible (prune)
    if (!wordBreak(s, wordDict)) return [];

    const memo = new Map<number, string[]>();

    function backtrack(start: number): string[] {
        if (memo.has(start)) return memo.get(start)!;

        if (start === s.length) return [""]; // Base case: empty tail

        const sentences: string[] = [];
        for (let end = start + 1; end <= s.length; end++) {
            const word = s.slice(start, end);
            if (wordSet.has(word)) {
                const restSentences = backtrack(end);
                for (const rest of restSentences) {
                    sentences.push(rest ? word + " " + rest : word);
                }
            }
        }

        memo.set(start, sentences);
        return sentences;
    }

    return backtrack(0);
}

cpp

class Solution {
    unordered_set<string> wordSet;
    unordered_map<int, vector<string>> memo;

    vector<string> backtrack(const string& s, int start) {
        if (memo.count(start)) return memo[start];

        if (start == (int)s.size()) return {""}; // Base case: empty tail

        vector<string> sentences;
        for (int end = start + 1; end <= (int)s.size(); end++) {
            string word = s.substr(start, end - start);
            if (wordSet.count(word)) {
                auto restSentences = backtrack(s, end);
                for (auto& rest : restSentences) {
                    sentences.push_back(rest.empty() ? word : word + " " + rest);
                }
            }
        }

        memo[start] = sentences;
        return sentences;
    }

public:
    vector<string> wordBreakII(string s, vector<string>& wordDict) {
        wordSet = unordered_set<string>(wordDict.begin(), wordDict.end());
        memo.clear();

        // First check if segmentation is even possible (prune)
        // using the DP approach from Part I
        int n = s.size();
        vector<bool> dp(n + 1, false);
        dp[0] = true;
        for (int i = 1; i <= n; i++) {
            for (int j = 0; j < i; j++) {
                if (dp[j] && wordSet.count(s.substr(j, i - j))) {
                    dp[i] = true;
                    break;
                }
            }
        }
        if (!dp[n]) return {};

        return backtrack(s, 0);
    }
};

Watch out:

Part I: not breaking early. Once dp[i] is set to True, stop checking more j values. Forgetting the break does not affect correctness but degrades performance.
Part II: not pruning with Part I check. For inputs like s = "aaaa...a" (n=300) with wordDict = ["a", "aa", ..., "aaa...a"], there are exponentially many segmentations. The Part I check can detect impossible cases in O(n^2), avoiding wasted backtracking.
Part II: string concatenation in recursion. Building sentences by prepending/appending strings in each recursive call can be inefficient. Using a list of words and joining at the end is more efficient.

Complexity Analysis

	Time	Space
Part I — Brute Force	O(2^n)	O(n)
Part I — DP	O(n^2 * m)	O(n)
Part II — Backtracking + Memo	O(2^n * n) worst case	O(2^n * n)

Bottleneck: Part I is bounded by O(n^2) iterations with O(m) substring hashing, giving O(n^2 * m). Part II can have exponentially many valid segmentations (e.g., "aaa...a" with dict ["a", "aa"]), so the output itself is exponential. The memoization avoids recomputing subproblems but cannot reduce the output size.

Edge Cases

Single character string — Check if it is in the dictionary.
Word dictionary contains the entire string — dp[n] is set directly from dp[0].
Empty dictionary — No word can match. Return false / empty list.
String with repeated characters, dict with various lengths — e.g., s = "aaa", dict = ["a", "aa"]. Part I returns true. Part II returns ["a a a", "aa a", "a aa"].
No valid segmentation — Part I returns false. Part II should return []. The pruning step prevents wasted computation.
Very long string for Part I (n=300) — DP handles this. The O(n^2) loop with up to 300 iterations is fast.

LC 472 — Concatenated Words — Find all words that can be built from other words in the list. Uses Word Break I as a subroutine.
LC 2707 — Extra Characters in a String — Minimize leftover characters after segmenting. Same DP structure, but minimizing instead of checking feasibility.
LC 1048 — Longest String Chain — DP on words with predecessor relationship.
LC 91 — Decode Ways — Similar DP: can the string be decoded? Each position depends on whether a prefix is valid.

What is Expected at Each Level

Junior: Solve Part I with the brute-force recursive approach. Recognize overlapping subproblems and convert to DP with guidance. May not attempt Part II.

Mid-level: Write the DP solution for Part I cleanly. Optimize with max word length. For Part II, implement backtracking with memoization. Discuss the exponential nature of Part II output and why memoization helps.

Senior: Solve both parts quickly. Discuss Trie-based optimization for word lookup (avoids repeated substring creation). Explain why Part II is inherently exponential and connect to NP-hard problems. Discuss BFS approach for Part I as an alternative. Mention practical applications: NLP word segmentation, autocomplete.

Word Break I & II ​

Problem Statement ​

Word Break I (LC 139) ​

Word Break II (LC 140) ​

Pattern Recognition ​

Approach ​

Part I — Word Break ​

Brute Force — Recursion ​

Optimal — Bottom-Up DP ​

Part II — Word Break II ​

Optimal — Backtracking with Memoization ​

Walkthrough ​

Part I — s = "applepenapple", wordDict = ["apple", "pen"] ​

Part II — s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"] ​

Implementation ​

Part I — Word Break ​

Part II — Word Break II ​

Complexity Analysis ​

Edge Cases ​

Related Problems ​

What is Expected at Each Level ​

Word Break I & II

Problem Statement

Word Break I (LC 139)

Word Break II (LC 140)

Pattern Recognition

Approach

Part I — Word Break

Brute Force — Recursion

Optimal — Bottom-Up DP

Part II — Word Break II

Optimal — Backtracking with Memoization

Walkthrough

Part I — `s = "applepenapple"`, `wordDict = ["apple", "pen"]`

Part II — `s = "catsanddog"`, `wordDict = ["cat","cats","and","sand","dog"]`

Implementation

Part I — Word Break

Part II — Word Break II

Complexity Analysis

Edge Cases

Related Problems

What is Expected at Each Level