Trier Tree distilled
If you walk into any interview that deals with text processing. You will inevitably be asked a question that involves a Trier Tree (TT). A TT is a special tree with its edge marked by identification symbol and its vertex (node) marked by success/failure of predetermined condition.
Take the typical definition of a TT whose edge is a alphabet letter and vertex indicating if a word can be found in a dictionary connecting all edge letters from root to here. This is the most common use of a TT. Another scenario involves having edges marked by numeric letter and node indicating if a telephone number can be found in a phone book connecting all edge numbers from root to here.
It can be summarized that a TT is a tree to describe a structure of composition from elemental building blocks to some known super structures. Symbolically, let E{x | x belongs to E} denote elemental building blocks and S{y | y belongs to S} denote super structures. All edge symbols of TT belongs to E and connecting edge symbols along a path results in c that either belongs to S {y} or not.
A easy way to understand TT is to think of it as if it's composed of many paths, some path terminates with a success symbol and some path terminates with a failure symbol.
Take this example, banana, we know that ban and banana are both valid words by looking up a dictionary. R is root, F is failure, S is success, ---(alpha)--- is a edge. According to the dictionary we construct a TT such as following:
R---b--->F---a--->F---n--->S---a--->F---n--->F---a--->S
Now given the challedge 'is ba a word?', by following such a TT, we know that it ends up with a vertex symbol F indicating 'ba' is not a word.
More complicated TT can be built by following such a principle, first there is a dictionary of all super structures S that can be broken down into elemental building blocks E. Start from root R, create a path for each y in S using its building blocks x in E as edge symbols, mark each vertex on its path F if it's a new vertex and mark its ending vertex S. After creating such a TT, then it can be used to verify any y in S in O(length(y)) time.
Take the typical definition of a TT whose edge is a alphabet letter and vertex indicating if a word can be found in a dictionary connecting all edge letters from root to here. This is the most common use of a TT. Another scenario involves having edges marked by numeric letter and node indicating if a telephone number can be found in a phone book connecting all edge numbers from root to here.
It can be summarized that a TT is a tree to describe a structure of composition from elemental building blocks to some known super structures. Symbolically, let E{x | x belongs to E} denote elemental building blocks and S{y | y belongs to S} denote super structures. All edge symbols of TT belongs to E and connecting edge symbols along a path results in c that either belongs to S {y} or not.
A easy way to understand TT is to think of it as if it's composed of many paths, some path terminates with a success symbol and some path terminates with a failure symbol.
Take this example, banana, we know that ban and banana are both valid words by looking up a dictionary. R is root, F is failure, S is success, ---(alpha)--- is a edge. According to the dictionary we construct a TT such as following:
R---b--->F---a--->F---n--->S---a--->F---n--->F---a--->S
Now given the challedge 'is ba a word?', by following such a TT, we know that it ends up with a vertex symbol F indicating 'ba' is not a word.
More complicated TT can be built by following such a principle, first there is a dictionary of all super structures S that can be broken down into elemental building blocks E. Start from root R, create a path for each y in S using its building blocks x in E as edge symbols, mark each vertex on its path F if it's a new vertex and mark its ending vertex S. After creating such a TT, then it can be used to verify any y in S in O(length(y)) time.
<< Home