String break

Problem description:

Given a String and a dictionary of words, write a program that returns true if the given string can be formed by concatenating one or more of the words in the dictionary, eg.

Given the string S = “algorithmstuff”

and the dictionary dic = {“algorithm”, “stuff”}

Our function must return true since S can be formed by concatenating “algorithm” + “stuff”

but given S = “algorithmstuffisgood”

and dic = {“algorithms”, “good”, “stuff”} must return false since there’s no way to build S from the strings in the dictionary.


This is a very good problem for a programming interview since it has one “lazy solution” which is very inefficient but it can be converted to a very fast algorithm with just a simple modification, so you could think about the most obvious solution even if it’s slow and once you got a working solution you can think about how to improve it.

This problem is clearly a good candidate for a recursive solution, for each recursive call the idea is to start checking if the first character is in our dictionary, if so, we call the function recursively but now with string S starting with the next character, otherwise we check if the first two characters form a valid word and so on, for instance:

S = “hithere”

dic = { “there”, “hi”}

We start with the string “hithere” at index 0, so we check if the substring “h” is a valid word, since it’s not we continue now with the next character which is “i”, now we check if the substring “hi” is a valid word in the dictionary, it is so we know that “hi” can be formed by concatenating one or more words of the dictionary so we now call the function recursively but starting with the next index which is 2, so now we are gonna have the same function with the substring “there”, we are gonna stop when we reach the end of the string S.

This is basically a DFS algorithm ( , here the code:

public boolean stringBreak(String s, Set<String> dict) {
    return dfs(s, 0, dict);
boolean dfs(String s, int i, Set<String> dict)
        return true;

    for(int j = i; j < s.length(); ++j)
        if(dict.contains(s.substring(i, j + 1)))
            if(dfs(s, j + 1, dict, set)) return true;

    return false;

The solution above works! but it’s too slow since there are many overlapping calls (it does the same thing many times), let me show you with an example:

let’s say we have the string S = “helloworld”

and dict = {“he”, “e”, “h”}

Our recursive function is gonna build a tree like this:


                   /                \

           h-elloworld           he-lloworld



As you can see the substring “lloworld” is gonna be computed two times and the result is gonna be the same, the problem here is that we could end up with too many overlapping calls which would be a waste of time. So, how can we avoid overlapping calls?

The solution is to have a record of which substrings we have already computed so that when our function is called again for the same substring we already know the answer, this technique is called MEMOIZATION (, Here is the code that shows this optimization.

public boolean stringBreak(String s, Set<String> dict) {
    HashSet<Integer> memo = new HashSet<Integer>(); // We just need to keep track of those indexes that we have already computed and the result is false
    return dfs(s, 0, dict, memo);
 boolean dfs(String s, int i, Set<String> dict, HashSet<Integer> memo)
         return true;
     if(memo.contains(i)) // if we have already computed the result for this substring we just return the answer
         return false;
     //otherwise, compute the answer for this substring
     for(int j = i; j < s.length(); ++j)
         if(dict.contains(s.substring(i, j + 1)))
             if(dfs(s, j + 1, dict, set)) return true;

     //we just store the results for substrings which result is false
     return false;


9 thoughts on “String break

  1. What about using REGEX?

    Just iterate trough all the words in your dictionary and put them in a regex. i.e.

    Considering the dictionary {“algorithms”, “good”, “stuff”}

    build the regex ^(algorithms|good|stuff)+$

    where ^ indicates the begining of the string
    | to add a different option
    () to group the options
    + because you are expecting to match at least one option ONCE at a time
    $ to define the end of the line or the string

    so a java code would be like this one

    String pattern = “^(algorithms|good|stuff)+$”;

    String s0=”algorithmsstuffgood”;
    String s1=”algorithmstuffisgood”;
    String s2=”algorithmsalgorithmsalgorithms”;
    String s3=”stuffgood”;
    String s4=”hellostuff”;
    String s5=”stuff”;
    String s6=” good”;
    String s7=”\t stuff”;

    System.out.println(s0.matches(pattern)); //true
    System.out.println(s1.matches(pattern)); //false
    System.out.println(s2.matches(pattern)); //true
    System.out.println(s3.matches(pattern)); //true
    System.out.println(s4.matches(pattern)); //false
    System.out.println(s5.matches(pattern)); //true
    System.out.println(s6.matches(pattern)); //false
    System.out.println(s7.matches(pattern)); //false

    • Hi men,

      Thanks for sharing your solution, your approach is simple, easy to write and also easy to understand, although the key fact here is to understand how REGEX works, as far as I know REGEX would need to construct some kind of data structure (maybe a graph) to store all the possible transitions from one state to another (since REGEX algorithm is based on a Non deterministic Finite Automata) and it would be very expensive talking about time and space considering that in this case our pattern string is gonna be too long (remember the dictionary could contain millions of words).

    • Try your solution with the following test case:

      dictionary = { “abc”, “cde” }
      input = “abcde”

      I think (haven’t run it yet) will output TRUE.

      The right answer is FALSE.

      • Its the same case as S0 and s1

        String s0=”algorithmsstuffgood”; //true
        String s1=”algorithmstuffisgood”; //false

    • (couldn’t reply to your nested comment)

      with the test case:

      dictionary = { “abc”, “cde” }
      input = “abcde”

      I am pretty sure (not completely, a test run will give me more confidence) your code will output TRUE, while the right answer is FALSE.

      you state s0 and s1 are equivalent test cases, but s1 is marked as FALSE because of the “is” in s1=”algorithmstuff>iss<tuffisgood”.

      Try with s1=”algorithmstuffgood” and almost sure it will output TRUE.

      • Sorry, it’s:


      • Ok I see, I have run your test case and it returns the correct answer which is FALSE, as you said. You can try it by yourself and check it if you want.

        The reason, as far as I know, it looks for the entire word matching in the String in this case, “algorithms” and keep testing the rest of the String “tuffgood” where there are no more matches from the begining of this new string.

  2. My solution in C#
    public static bool isPossible(string a, string[] strs)
    foreach (string s in strs)
    a = a.Replace(s, “”);

    if (a == String.Empty)
    return true;

    return false;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s