What the heck is a trie? Right, well that’s what this whole post is about.

From https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/:

  • The word trie is an infix of the word “retrieval” because the trie can find a single word in a dictionary with only a prefix of the word.
  • The trie is a tree where each vertex represents a single word or a prefix.
  • The root represents an empty string (“”), the vertexes that are direct sons of the root represent prefixes of length 1, the vertexes that are 2 edges of distance from the root represent prefixes of length 2, the vertexes that are 3 edges of distance from the root represent prefixes of length 3 and so on. In other words, a vertex that are k edges of distance of the root have an associated prefix of length k.
  • Let v and w be two vertexes of the trie, and assume that v is a direct father of w, then v must have an associated prefix of w.

I’ve used their definition because it was the clearest one I could find and / or come up with on my own. Let’s see what this looks like in a graphic:

Screen Shot 2016-02-06 at 12.32.17 AM

 

As you can see, if you wanted retrieve the word ‘many’, you would traverse the tree as follows:

M -> A -> N -> Y

And since the word exists in our trie, we can return true when we run retrieve(“MANY”). Trie’s have a time complexity of O(W*L), where W is the number of words and L is the average length of all words stored. If you’ve been following along here, you’ve now seen binary search trees as well as tries. So what’s the difference?

From quroa:

binary search tree is a dynamic version of what happens during quicksort. The root represents an arbitrary (but hopefully not too far off from the median) pivot element from the collection. The left subtree is then everything less than the root, and the right subtree is everything greater than the root. The left and right collections are then again ordered in the same manner, i.e. the data structure is defined recursively.

trie is a dynamic version of what happens during radix sort. You look at the first bit or digit of a number (or first letter of a string) to determine which subtree the value belongs in. You then repeat the procedure recursively using the next character or digit to determine which of the subtree’s children it belongs in, and so on.

Hopefully at this point, tries make total sense to you. Because right now, we’re going to create one in Java!

We start off by creating the Node:

class Node {
    // this is the character that is stored
    char c;

    // this HashMap holds the children, mapping characters to Nodes
    HashMap<Character, TrieNode> children = new HashMap<Character, TrieNode>();

    // this tells us if the Node is at the bottom of the trie
    boolean isLeaf;

    public TrieNode() {
        
        // this will represent our empty Node
        this.c = '$';

    }

    // initialize a Node with a character
    public TrieNode(char c){
        this.c = c;
    }
}

Now let’s actually use our Node definition to build our very own TRIE! You can find the original code here.

 

class Trie{

    // this is the root element of the trie
    Node root;

    // constructor
    public Trie(){

        // creates a root Node that is empty
        root = new Node();

    }

    public void insert(String word){

        // create a new HashMap and set it equal to root.children
        HashMap<Character, Node> children = root.children;

        for(int = 0; i < word.length(); i++){
            
            // grab a character iteratively
            char c = word.charAt(i);

            Node focusNode;

            // if the key exists, set the focusNode to the child
            if(children.containsKey(c)){

                focusNode = children.get(c);

            } else {

                // otherwise, create a new Node and put it
                // in the HashMap
                focusNode = new Node(c);
                children.put(c, t);

            }
 
            // set the children equal to the focusNode's 
            // children so that we can move down the Trie
            children = focusNode.children;

            // if the iterator and length are equal
            // we know that the focusNode is a Leaf
            if(i+1 == word.length())
                focusNode.isLeaf = true;

        }

    }

    // Returns if the word is in the trie.
    public boolean search(String word) {
        Node focusNode = searchNode(word);

        if(focusNode != null && focusNode.isLeaf)
            return true;
        else
            return false;
    }

    // Returns if there is any word in the trie
    // that starts with the given prefix.
    public boolean startsWith(String prefix) {
        if(searchNode(prefix) == null)
            return false;
        else
            return true;
    }

    public TrieNode searchNode(String str){
        Map<Character, Node> children = root.children;
        Node t = null;
        for(int i=0; i < str.length(); i++){
            char c = str.charAt(i);
            if(children.containsKey(c)){
                t = children.get(c);
                children = t.children;
            }else{
                return null;
            }
        }

        return t;
    }

}

And that’s all there is to Tries! In essence, it’s a clever way to map out data such that each node builds on the other to create a dynamic array structure. It has a quick lookup time and is greatly flexible, especially when it comes to managing String data – such as a lookup dictionary.

Advertisements