From 1ef2fa3924d7ae9e5b7c6f6d5902721b503add91 Mon Sep 17 00:00:00 2001 From: Aakash Panchal <51417248+Aakash-Panchal27@users.noreply.github.com> Date: Tue, 2 Jun 2020 13:48:02 +0530 Subject: [PATCH] Update Trie.md --- articles/Akash Articles/md/Trie.md | 200 +++++++++++++++++++++++++++-- 1 file changed, 186 insertions(+), 14 deletions(-) diff --git a/articles/Akash Articles/md/Trie.md b/articles/Akash Articles/md/Trie.md index c449669..59cb2d0 100644 --- a/articles/Akash Articles/md/Trie.md +++ b/articles/Akash Articles/md/Trie.md @@ -1,8 +1,24 @@ + + + + + + Do you know how the "auto-completion feature" provided by different software like IDEs, Search Engines, command-line interpreters, text editors, etc works? ![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/1.png) +Below is an input box, which has an autocomplete feature for "country names". Try it out! + + + + + The basic data structure behind all these scenes is **Trie**. ![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/2.png) @@ -14,17 +30,30 @@ String processing is widely used across real-world applications, for example dat Trie is a very useful and special kind of data structure for string processing. +Below is a very simple representation of trie consisting of `"cat"`, `"bat"`, `"dog"` strings. + +![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/lastadd.jpg) + +Now, suppose we are given a string-array and we are told that check whether `"cat"` string is present in the array. Then we can check it via brute force-compare with each and every string present in the string-array, which would take $O(N*length("cat"))$ in the worst-case situation, where $N$ is the number of string in the array. + +Now, if you create a trie from all the strings present in the array, then you can simply check it in $O(length("cat"))$ time by traversing through trie(confused? we will see it soon), which is very efficient and therefore trie is an efficient information retrieval data structure. + ## Introduction Trie is a tree of nodes, where the specifications of a node can be given as below: Each node has, -1. An array of size of the alphabet(see the note below). +1. An array of size of the alphabet(see the note below) to store links to other nodes. 2. A boolean variable. -**Note:** For an easy understanding purpose, we are assuming that all strings contain lowercase alphabet letters, i.e. `alphabet_size` is $26$. **We can convert characters to a number by using `c-'a'`, `c` is a lowercase character.** +**Notes** -**We will see usages of these two variables soon.** +1. For an easy understanding purpose, we are assuming that all strings contain only lowercase alphabet letters, i.e. `alphabet_size` is $26$. +2. We will discuss the traditional implementation here, although we can use some data structures like hash table in each node. + +![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/3.png) + +**We will see "why do we need these two variables?" soon.** ```cpp struct trie_node @@ -43,9 +72,7 @@ struct trie_node }; ``` -![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/3.png) - -Now, we have seen how a trie node looks like. Let's see how we are going to store strings in a trie using this kind of node. +Now, we have seen how a trie node looks like. Let's see **how we are going to store strings in a trie using this kind of node.** ## How to insert a string in a trie? @@ -55,11 +82,8 @@ Look at the image below, which represents a string "act" stored in a trie. Obser **Note: Empty places in the array have null values(`nullptr` in c++).** -What did you observe? - -Observations: -1. **Other than the root node, each node in trie represents a single character.** -2. **We set isEndofString to true in the node at which the string ends.** +1. **Other than the root node, each node in trie represents a single character.** In the above image, $2^{nd}$, $3^{rd}$, $4^{th}$ node represents `'a'`, `'c'`, and `'t'` respectively. +2. **The node at which the string ends, we set isEndofString to true.** See last node in the image above. Therefore, now for the shake of ease we are going to represent the nodes of trie as below. @@ -85,7 +109,7 @@ A common prefix of `"ace"` and `"act"` is `"ac"` and therefore we are having the Therefore, we are not creating any new node until we need one and **Trie is a very efficient data storage, when we have a large list of strings sharing common prefixes.** It is also known as **prefix tree**. -Now, observe the trie below, which contains three strings `"act"`, `"ace"` and `"cat"`. +Now, look the trie below, which contains three strings `"act"`, `"ace"` and `"cat"`. ![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/9(1).jpg) @@ -426,7 +450,7 @@ Hashtable can be used to implement a dictionary. After precomputation of hash fo But as the dictionary is very large there will be collisions between two or more words. Still, you can design a hash table to have efficient look-ups. -But space usages is very high, as we simply store each word. But what if we design it using a trie? +But hashtable has a very high space usages, as we simply store each word and attatched data. But what if we design it using a trie? As in a dictionary we have many common-prefix words, trie will save a substantial amount of memory consumption. Trie supports look-up in $O(\text{word length})$, which is higher than a very efficient hash table. @@ -435,7 +459,7 @@ Other advantages of the trie are as below: 2. It also supports ordered traversal of words with given prefix 3. No need for complex hash functions -So, if you want some of the above features then using trie is good for you. Also, we don't have to deal with collisions. +So, if you want some of the above features, then using a trie is good. Also, we don't have to deal with collisions. Note that in the dictionary along with a word, we have explanations or meanings of that word. That can be handled by separately maintaining an array that stores all those extra stuff. Then store one integer in the `TrieNode` structure to store the index of the corresponding data in the array. @@ -454,3 +478,151 @@ struct trie_node The below image shows a typical trie structure for the dictionary. ![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Trie/13.jpg) + + + + + + + + +