Tries
-
A trie is a tree-like data structure that implements the dictionary ADT.
-
The word “trie” was suggested by Edward Fredkin and is short for “retrieval tree”. Most people pronounce it as “try”, except for Edward Fredkin, who pronounced it as “tree”.[1]
-
In a trie, keys are usually strings from some alphabet of fixed size: . Unlike in a tree, nodes in a trie, does not store keys. Rather, nodes represent single characters that can be part of the keys. We can therefore start at the root of a trie and follow the edges through the nodes representing . When we reach the leaf, we will get the string . Thus, we can think of a trie as a tree whose leaf nodes are strings. We can think of every internal node of a trie as standing for some prefix of all the strings in the leaves below it: specifically, an internal node at level stands for the first characters of each string below it.
-
For example, suppose the alphabet consists of letters A–Z. We could use a trie to store names like so:
In this example, is used to indicate the end of a name. Notice how a name can be a prefix of another name (e.g., “Ron” and “Ronald”).
-
Tries are made of nodes. A common way to implement a node is to have an array of pointers to other nodes and one other field that provides more information about the node and indicates if that node represents a key.
Here is how we can define a trie node in C++:
struct TrieNode { <SomeType> data; TrieNode *children[ALPHABET_SIZE]; }
The
data
field could be abool
to indicate if a word exists (or is valid). Or thedata
field could be a value of some other type, such as a string representing the definition of the word that is the key.children
is an array of some fixed alphabet size. The alphabet does not have to be restricted to letters A–Z. For example, the alphabet could be numbers 0–9 or even a wider range of ASCII characters.With this representation, the same trie we described above would look like this:
-
As you can see, tries use large amounts of memory, but trade this for more efficient running time of
find
,insert
andremove
operations. Their worst case running time is where is the length of the key. We often describe it as , since we ignore the length of the key (e.g., during hashing). -
Unlike a hash table, there are no collisions in a trie. However, duplicate keys cannot be allowed in a trie, so no two keys have the same path.
-
You can imagine how we could reduce the amount of memory the trie is using by having a dynamic hash table instead of an array for storing pointers to children at each node. Furthermore, we could compress the paths by skipping multiple levels of nodes that have only one child. The specifics of optimizing tries depends on their application.
-
The main appeal of tries is their ability to support rapid prefix matching and approximate matching. Applications of a trie include validation of words, autocorrect feature, and the search feature in the Contacts app (when the user just starts typing the first few letters of someone’s name).