mirror of
https://github.com/dholerobin/Lecture_Notes.git
synced 2025-07-01 13:06:29 +00:00
Update Trie.md
This commit is contained in:
parent
99b27b0733
commit
e5fe2c4725
@ -1,4 +1,5 @@
|
|||||||
Do you know how autocompletion provided by different softwares like IDEs, Search Engines, command-line interpreters, text editors, etc works?
|
|
||||||
|
Do you know how the "auto-completion feature" provided by different software like IDEs, Search Engines, command-line interpreters, text editors, etc works?
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@ -9,41 +10,41 @@ The basic data structure behind all these scenes is **Trie**.
|
|||||||
Spell checkers can also be designed using **Trie**.
|
Spell checkers can also be designed using **Trie**.
|
||||||
|
|
||||||
# Trie
|
# Trie
|
||||||
String processing is widely used across real world applications, for example data analytics, search engines, bioinformatics, plagiarism detection, etc.
|
String processing is widely used across real-world applications, for example data analytics, search engines, bioinformatics, plagiarism detection, etc.
|
||||||
|
|
||||||
Trie is very useful and special kind of data structure for string processing.
|
Trie is a very useful and special kind of data structure for string processing.
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Trie is basically a tree of nodes, where specification of a node can be given as below:
|
Trie is a tree of nodes, where the specifications of a node can be given as below:
|
||||||
|
|
||||||
Each node has,
|
Each node has,
|
||||||
1. An array of datatype node and of size of alphabet.
|
1. An array of datatype `node` having the size of the alphabet(see the note below).
|
||||||
2. A boolean value(We will see why it is needed).
|
2. A boolean variable.
|
||||||
|
|
||||||
We will see usages of these two variables soon.
|
**We will see usages of these two variables soon.**
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
struct trie_node
|
struct trie_node
|
||||||
{
|
{
|
||||||
// Array of pointers of type
|
// Array of pointers of type
|
||||||
// trie_node
|
// trie_node
|
||||||
vector<trie_node*> links;
|
vector<trie_node*> links;
|
||||||
bool isEndofString;
|
bool isEndofString;
|
||||||
|
|
||||||
trie_node(bool end = false)
|
trie_node(bool end = false)
|
||||||
{
|
{
|
||||||
links.assign(alphabet_size, nullptr);
|
links.assign(alphabet_size, nullptr);
|
||||||
isEndofString = end;
|
isEndofString = end;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note:** For easy understanding purpose, we are assuming that all strings contain lowercase alphabet letters that is alphabet size is $26$. **We can convert characters to a number by using `c-'a'`, `c` is a lowercase character.**
|
**Note:** For an easy understanding purpose, we are assuming that all strings contain lowercase alphabet letters, i.e. `alphabet_size` is $26$. **We can convert characters to a number by using `c-'a'`, `c` is a lowercase character.**
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Now, we have seen how trie node looks like. Let's see how we are going to store strings in a trie using these kind of nodes.
|
Now, we have seen how a trie node looks like. Let's see how we are going to store strings in a trie using this kind of node.
|
||||||
|
|
||||||
## How to insert a string in a trie?
|
## How to insert a string in a trie?
|
||||||
|
|
||||||
@ -51,18 +52,18 @@ Look at the image below, which represents a string "act" stored in a trie. Obser
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
**Note: Empty places in the array have null values(`nullptr` in c++).**
|
||||||
|
|
||||||
What did you observe?
|
What did you observe?
|
||||||
|
|
||||||
Observations:
|
Observations:
|
||||||
1. **Other root node, each node in trie represents a single character.**
|
1. **Other than the root node, each node in trie represents a single character.**
|
||||||
2. **We set isEndofString to true in the node at which the string ends.**
|
2. **We set isEndofString to true in the node at which the string ends.**
|
||||||
|
|
||||||
Therefore, now for the shake of ease we are going to represent the nodes of trie as below.
|
Therefore, now for the shake of ease we are going to represent the nodes of trie as below.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
**Note: Empty places in array have null values.**
|
|
||||||
|
|
||||||
And therefore representation of trie containing string "act" will be as below.
|
And therefore representation of trie containing string "act" will be as below.
|
||||||
|
|
||||||
.jpg)
|
.jpg)
|
||||||
@ -73,13 +74,13 @@ Now, observe the trie below, which contains two strings "act" and "ace".
|
|||||||
|
|
||||||
.jpg)
|
.jpg)
|
||||||
|
|
||||||
Note that the node representing character `c` in the above trie, in magnified sense would look as below:
|
Note that the node representing character `c` in the above trie, in a magnified sense would look as below:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
What did you observe?
|
What did you observe?
|
||||||
|
|
||||||
Common prefix of `"ace"` and `"act"` is `"ac"` and therefore we are having same nodes until we traverse `"ac"` and then we create a new node for character `e`.
|
A common prefix of `"ace"` and `"act"` is `"ac"` and therefore we are having the same nodes until we traverse `"ac"` and then we create a new node for character `e`.
|
||||||
|
|
||||||
Therefore, we are not creating any new node until we need one and **Trie is a very efficient data storage, when we have a large list of strings sharing common prefixes.** It is also known as **prefix tree**.
|
Therefore, we are not creating any new node until we need one and **Trie is a very efficient data storage, when we have a large list of strings sharing common prefixes.** It is also known as **prefix tree**.
|
||||||
|
|
||||||
@ -87,24 +88,24 @@ Now, observe the trie below, which contains three strings `"act"`, `"ace"` and `
|
|||||||
|
|
||||||
.jpg)
|
.jpg)
|
||||||
|
|
||||||
Let's see proper algorithm to insert a string in a trie.
|
Let's see a proper algorithm to insert a string in a trie.
|
||||||
|
|
||||||
1. Starting from the root, if there is already a node representing corresponding character of a string, then simply traverse.
|
1. Starting from the root, if there is already a node representing the corresponding character of a string, then simply traverse.
|
||||||
2. Otherwise, create a new node representing corresponding character.
|
2. Otherwise, create a new node representing the corresponding character.
|
||||||
3. At the end of string, set `isEndofString` to true in the last ending node.
|
3. At the end of the string, set `isEndofString` to true in the last ending node.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
void insert(trie_node* root, string s)
|
void insert(trie_node* root, string s)
|
||||||
{
|
{
|
||||||
trie_node* temp = root;
|
trie_node* temp = root;
|
||||||
int n = s.size();
|
int n = s.size();
|
||||||
for(int i = 0; i < n; i++){
|
for(int i = 0; i < n; i++){
|
||||||
if(temp->link[s[i]-'a'] == nullptr)
|
if(temp->link[s[i]-'a'] == nullptr)
|
||||||
temp->link[s[i]-'a'] = new trie_node();
|
temp->link[s[i]-'a'] = new trie_node();
|
||||||
// Traverse using link
|
// Traverse using link
|
||||||
temp = temp->link[s[i]-'a'];
|
temp = temp->link[s[i]-'a'];
|
||||||
}
|
}
|
||||||
temp->isEndofString = true;
|
temp->isEndofString = true;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -124,28 +125,54 @@ Observe the trie given below and try to search whether `"on"` is present or not.
|
|||||||
|
|
||||||
.jpg)
|
.jpg)
|
||||||
|
|
||||||
If you don't have `isEndofString` variable, then you will not be able to correctly check whether `on` is present or not. Because it is prefix of `once`.
|
If you don't have `isEndofString` variable, then you will not be able to correctly check whether `on` is present or not. Because it is the prefix of `once`.
|
||||||
|
|
||||||
**Algorithm**:
|
**Algorithm**:
|
||||||
|
|
||||||
1. Starting from the root, try to traverse corresponding character of the string. If a link is present, then go ahead.
|
1. Starting from the root, try to traverse the corresponding character of the string. If a link is present, then go ahead.
|
||||||
2. Otherwise, simply given string is not present in the trie.
|
2. Otherwise, simply given string is not present in the trie.
|
||||||
3. If you are successfully able to traverse according to the string, then check whether the query string is really present or not via `isEndofString` variable of a last node.
|
3. If you are successfully able to traverse all corresponding characters of the string, then check whether the query string is present or not via `isEndofString` variable of the last node.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
bool search(trie_node* root, string s)
|
bool search(trie_node* root, string s)
|
||||||
{
|
{
|
||||||
trie_node* temp = root;
|
trie_node* temp = root;
|
||||||
int n = s.size();
|
int n = s.size();
|
||||||
for(int i = 0; i < n; i++){
|
for(int i = 0; i < n; i++){
|
||||||
// There is not further link
|
// There is not further link
|
||||||
if(temp->link[s[i]-'a'] == nullptr)
|
if(temp->link[s[i]-'a'] == nullptr)
|
||||||
return false;
|
return false;
|
||||||
temp = temp->link[s[i]-'a'];
|
temp = temp->link[s[i]-'a'];
|
||||||
}
|
}
|
||||||
return temp->isEndofString;
|
return temp->isEndofString;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
Can you find recursive version of the above function?
|
||||||
|
|
||||||
|
**Recursive version:**
|
||||||
|
```cpp
|
||||||
|
// @param: root -> root of the trie
|
||||||
|
// @param: s -> the string we are deleting
|
||||||
|
// @param: i -> index of s currently reached via recursive traversal
|
||||||
|
bool Rec_search(trie_node* root, string& s, int i = 0)
|
||||||
|
{
|
||||||
|
// No link present
|
||||||
|
// so string is not present
|
||||||
|
if(root == nullptr)
|
||||||
|
return false;
|
||||||
|
if(i == s.size()) {
|
||||||
|
// present
|
||||||
|
if(root->isEndofString)
|
||||||
|
return true;
|
||||||
|
else
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
// Recusively traverse using links
|
||||||
|
return Rec_search(root->link[s[i]-'a'], s, i+1);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Time Complexity:** $O(N)$, where $N$ is the length of the string we are searching for.
|
||||||
|
|
||||||
## Delete
|
## Delete
|
||||||
|
|
||||||
@ -157,126 +184,126 @@ Things to take care about while you are deleting a string from the trie,
|
|||||||
1. It should not affect any other string present in the trie.
|
1. It should not affect any other string present in the trie.
|
||||||
2. Therefore, we are only going to delete **the nodes which are present only due to the presence of the given string**. And no other string is passing through them.
|
2. Therefore, we are only going to delete **the nodes which are present only due to the presence of the given string**. And no other string is passing through them.
|
||||||
|
|
||||||
We are going to use recursive procedure. If the string is not present, then we will return `false` and `true` otherwise.
|
We are going to use a recursive procedure. If the string is not present, then we will return `false` and `true` otherwise. **Recursive procedure for delete is a modified version of the recursive search procedure** and therefore make sure you understand that.
|
||||||
|
|
||||||
1. We are traversing trie via the given string recursively.
|
Can you figure it out on your own?
|
||||||
2. While traversing, if we find that no link is present(`nullptr`) for the current character, then string is not present in the trie and return `false`.
|
|
||||||
3. If we are successfully able to traverse the string(`i==s.size())`, then finally check `isEndofString` of the last node. If the string is really present, then return `true`. Otherwise return `false`.
|
**Procedure:**
|
||||||
4. Now, while backtracking stage of recursion, delete nodes if it is no longer needed after deletion of the given string.
|
|
||||||
|
1. We are traversing the trie recursively, the same way as in `Rec_search()` procedure.
|
||||||
|
2. While traversing, if we find that no link is present(`root == nullptr`) for the current character, then the string is not present in the trie and return `false`.
|
||||||
|
3. If we are successfully able to traverse the whole string until `i==s.size()`, then finally check `isEndofString` of the last node. If the string is present(`isEndofString = true)`, then set it to `false` and return `true`. Otherwise, return `false`-not present.
|
||||||
|
4. Now, while backtracking stage of the recursion, delete nodes if it is no longer needed after deletion of the given string.
|
||||||
|
|
||||||
|
Now, go through the code below with very intuitive comments.
|
||||||
|
|
||||||
Now, Go through the code below, very intuitive comments are written.
|
|
||||||
```cpp
|
```cpp
|
||||||
// Checks whether any link is present
|
// Checks whether any link is present
|
||||||
bool isEmptyNode(trie_node* node)
|
bool isEmptyNode(trie_node* node)
|
||||||
{
|
{
|
||||||
for(auto i:node->link)
|
for(auto i:node->link)
|
||||||
if(i != nullptr)
|
if(i != nullptr)
|
||||||
return false;
|
return false;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Returns true, if the string is successfully deleted
|
// Returns true if the string is successfully deleted
|
||||||
// And if the string is not present in the trie then returns false.
|
// And if the string is not present in the trie then returns false.
|
||||||
// @param: root -> root of the trie
|
|
||||||
// @param: s -> string we are deleting
|
|
||||||
// @param: i -> index of @s currently reached via recursive traversal
|
|
||||||
bool deleteString(trie_node* root, string& s, int i = 0)
|
bool deleteString(trie_node* root, string& s, int i = 0)
|
||||||
{
|
{
|
||||||
// Means string is not present
|
if(root == nullptr)
|
||||||
if(root == nullptr)
|
return false;
|
||||||
return false;
|
|
||||||
|
if(i == s.size()) {
|
||||||
// Successfully traversed the whole string
|
// present
|
||||||
if(i == s.size()) {
|
if(root->isEndofString) {
|
||||||
|
// delete it
|
||||||
// Check whether the string is really present
|
root->isEndofString = false;
|
||||||
// by checking `isEndofString` variable of the last node
|
return true;
|
||||||
if(root->isEndofString) {
|
}
|
||||||
root->isEndofString = false;
|
else
|
||||||
return true;
|
return false;
|
||||||
}
|
}
|
||||||
else
|
|
||||||
return false;
|
bool ans = deleteString(root->link[s[i]-'a'], s, i+1);
|
||||||
}
|
|
||||||
|
// String is present
|
||||||
bool ans = deleteString(root->link[s[i]-'a'], s, i+1);
|
if(ans) {
|
||||||
|
// Check whether any other string
|
||||||
// String is present
|
// passes through this link node
|
||||||
if(ans) {
|
// If not passing, then delete it
|
||||||
|
if(isEmptyNode(root->link[s[i]-'a'])) {
|
||||||
|
|
||||||
// Check whether any other string
|
// Deallocate used memory
|
||||||
// passes through this node
|
delete root->link[s[i]-'a'];
|
||||||
// Not passing, then delete this node
|
root->link[s[i]-'a'] = nullptr;
|
||||||
if(isEmptyNode(root->link[s[i]-'a'])) {
|
}
|
||||||
|
return true;
|
||||||
// Deallocate used memory
|
}
|
||||||
delete root->link[s[i]-'a'];
|
|
||||||
root->link[s[i]-'a'] = nullptr;
|
// Not present the return false
|
||||||
}
|
return false;
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Not present the return false
|
|
||||||
return false;
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Time Complexity:** $O(N)$, where $N$ is the length of the string we are deleting.
|
||||||
|
|
||||||
## Trie as an array
|
## Trie as an array
|
||||||
|
|
||||||
Availability of dynamic arrays allow use to create Trie without using pointers.
|
The availability of dynamic arrays allows us to create Trie without using pointers.
|
||||||
|
|
||||||
Now, we are going to store trie as a dynamic array of `TrieNodes`. In this implementation, we are going to use an array of integers instead of pointers in `TrieNode` and as a link, we are going to store index of a node rather than address of a node in the former case.
|
Now, we are going to store trie as a dynamic array of `TrieNodes`. In this implementation, we are going to use an array of integers instead of pointers in `TrieNode` and as a link, we are going to store the index of a node rather than the address of a node in the former case.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
See the below implementation of trie as an array, which is quite similar and intuitive as previous implementation.
|
See the below implementation of trie as an array, which is quite similar and intuitive as the previous implementation.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
struct TrieNode
|
struct TrieNode
|
||||||
{
|
{
|
||||||
vector<int> id_link;
|
vector<int> id_link;
|
||||||
bool isEndofString;
|
bool isEndofString;
|
||||||
|
|
||||||
TrieNode(bool end = false)
|
TrieNode(bool end = false)
|
||||||
{
|
{
|
||||||
end = isEndofString;
|
end = isEndofString;
|
||||||
id_link.assign(26,-1);
|
id_link.assign(26,-1);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
void insert(vector<TrieNode>& trie, string s)
|
void insert(vector<TrieNode>& trie, string s)
|
||||||
{
|
{
|
||||||
int temp = 0;
|
int temp = 0;
|
||||||
int n = s.size();
|
int n = s.size();
|
||||||
for(int i = 0; i < n; i++) {
|
for(int i = 0; i < n; i++) {
|
||||||
if(trie[temp].id_link[s[i]-'a'] == -1) {
|
if(trie[temp].id_link[s[i]-'a'] == -1) {
|
||||||
trie[temp].id_link[s[i]-'a'] = (int)trie.size();
|
trie[temp].id_link[s[i]-'a'] = (int)trie.size();
|
||||||
trie.push_back(TrieNode());
|
trie.push_back(TrieNode());
|
||||||
}
|
}
|
||||||
temp = trie[temp].id_link[s[i]-'a'];
|
temp = trie[temp].id_link[s[i]-'a'];
|
||||||
}
|
}
|
||||||
trie[temp].isEndofString = true;
|
trie[temp].isEndofString = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool search(vector<TrieNode>& trie, string s)
|
bool search(vector<TrieNode>& trie, string s)
|
||||||
{
|
{
|
||||||
int temp = 0;
|
int temp = 0;
|
||||||
int n = s.size();
|
int n = s.size();
|
||||||
for(int i = 0; i < n; i++) {
|
for(int i = 0; i < n; i++) {
|
||||||
if(trie[temp].id_link[s[i]-'a'] == -1)
|
if(trie[temp].id_link[s[i]-'a'] == -1)
|
||||||
return false;
|
return false;
|
||||||
temp = trie[temp].id_link[s[i]-'a'];
|
temp = trie[temp].id_link[s[i]-'a'];
|
||||||
}
|
}
|
||||||
return trie[temp].isEndofString;
|
return trie[temp].isEndofString;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
But it has a downside that you can not delete strings present in the trie. Why?
|
But it has a downside that you can not generally delete strings present in the trie. Why?
|
||||||
|
|
||||||
Try deleting a single node, you will realize that indexes of each subsequent node will change and moreover deleting in an array has a very bad performance.
|
Try deleting a single node(other than last one), you will realize that indexes of each subsequent node will change, and also deleting in an array has a very bad performance.
|
||||||
|
|
||||||
It is easy implemention, but with single downside. Therefore, use as per the requirement.
|
It is an easy implementation, but with a single downside. Therefore, use as per the requirement.
|
||||||
|
|
||||||
## Count total number of words present in a Trie
|
## Count the total number of words present in a Trie
|
||||||
|
|
||||||
How will you find the number of words(strings) present in the trie below?
|
How will you find the number of words(strings) present in the trie below?
|
||||||
|
|
||||||
@ -284,7 +311,7 @@ How will you find the number of words(strings) present in the trie below?
|
|||||||
|
|
||||||
Ultimately, It means to find the total number of nodes having `true` value of `isEndofString`. Which can be easily done using recursive traversal of all the nodes present in the trie.
|
Ultimately, It means to find the total number of nodes having `true` value of `isEndofString`. Which can be easily done using recursive traversal of all the nodes present in the trie.
|
||||||
|
|
||||||
The basic idea of recursive procedure is as follow:
|
The basic idea of the recursive procedure is as follow:
|
||||||
|
|
||||||
Start from the $\text{root}$ node and go through all $26$ positions of the `link` array. For each not-null link, recursively call `countWords()` considering that linked node as a $\text{root}$. And therefore formula will be as below:
|
Start from the $\text{root}$ node and go through all $26$ positions of the `link` array. For each not-null link, recursively call `countWords()` considering that linked node as a $\text{root}$. And therefore formula will be as below:
|
||||||
|
|
||||||
@ -295,135 +322,134 @@ Finally, add $1$ to $\text{TotalWords}$ if the current node has `isEndofString =
|
|||||||
```cpp
|
```cpp
|
||||||
int countWords(trie_node* root)
|
int countWords(trie_node* root)
|
||||||
{
|
{
|
||||||
int total = 0;
|
int total = 0;
|
||||||
if(root == nullptr)
|
if(root == nullptr)
|
||||||
return 0;
|
return 0;
|
||||||
for(auto i:root->link)
|
for(auto i:root->link)
|
||||||
if(i != nullptr)
|
if(i != nullptr)
|
||||||
total += countWords(i);
|
total += countWords(i);
|
||||||
total += root->isEndofString;
|
total += root->isEndofString;
|
||||||
return total;
|
return total;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
**Time complexity:** $O(\text{Number of nodes present in the trie})$, as we are visiting each and every node.
|
**Time complexity:** $O(\text{Number of nodes present in the trie})$, as we are visiting each and every node. <br>
|
||||||
**Space complexity:** $O(1)$
|
**Space complexity:** $O(1)$
|
||||||
|
|
||||||
## Print all words stored in Trie
|
## Print all words stored in Trie
|
||||||
|
|
||||||
It is similar to finding total number of words but instead of adding $1$ for each `isEndofString`'s true value, we are going to store the word representing that particular end.
|
It is similar to finding the total number of words but instead of adding $1$ for each `isEndofString`'s true value, we are going to store the word representing that particular end.
|
||||||
|
|
||||||
The code is similar as finding total number of words.
|
The code is similar to finding the total number of words.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
void printAllWords(trie_node* root, vector<string>& ans, string s="")
|
void printAllWords(trie_node* root, vector<string>& ans, string s="")
|
||||||
{
|
{
|
||||||
if(root == nullptr)
|
if(root == nullptr)
|
||||||
return;
|
return;
|
||||||
for(int i = 0; i < alphabet_size; i++) {
|
for(int i = 0; i < alphabet_size; i++) {
|
||||||
if(root->link[i] != nullptr) {
|
if(root->link[i] != nullptr) {
|
||||||
char c = 'a' + i;
|
char c = 'a' + i;
|
||||||
string temp = s;
|
string temp = s;
|
||||||
temp += c;
|
temp += c;
|
||||||
printAllWords(root->link[i], ans, temp);
|
printAllWords(root->link[i], ans, temp);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if(root->isEndofString)
|
if(root->isEndofString)
|
||||||
ans.push_back(s);
|
ans.push_back(s);
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
**Time complexity:** $O(\text{Number of nodes present in the trie})$, as we are visiting each and every node.
|
**Time complexity:** $O(\text{Number of nodes present in the trie})$, as we are visiting each and every node. <br>
|
||||||
**Space Complexity:** $O(\text{Total length of all words present in the trie})$
|
**Space Complexity:** $O(\text{Total length of all words present in the trie})$
|
||||||
|
|
||||||
## Auto-suggestion features
|
## Auto-suggestion features
|
||||||
|
|
||||||
How will you design autocompletion feature using Trie?
|
How will you design the autocompletion feature using Trie?
|
||||||
|
|
||||||
For example, we have stored C++ keywords in a trie. Now, when you type `"n"` it should show all keywords starting from `"n"`. For simplicity only keywords starting from `"n"` are shown in the trie below,
|
For example, we have stored C++ keywords in a trie. Now, when you type `"n"` it should show all keywords starting from `"n"`. For simplicity, only keywords starting from `"n"` are shown in the trie below,
|
||||||
|
|
||||||
.jpg)
|
.jpg)
|
||||||
|
|
||||||
How will you print all keywords starting from `"n"`? OR how will you print all keywords having `"n"` as prefix?
|
How will you print all keywords starting from `"n"`? OR how will you print all keywords having `"n"` as a prefix?
|
||||||
|
|
||||||
Simply use `printAllWords()` on node `n`, and problem is solved!
|
Simply use `printAllWords()` on node `n`, and the problem is solved!
|
||||||
|
|
||||||
Common procedure is as below:
|
A common procedure is as below:
|
||||||
|
|
||||||
1. Traverse nodes in trie according to the given uncomplete string `s`. If we are successfully able to traverse `s`, then there are keywords having prefix of `s`. Otherwise, there will be nothing to suggest.
|
1. Traverse nodes in trie according to the given uncomplete string `s`. If we are successfully able to traverse `s`, then there are keywords having a prefix of `s`. Otherwise, there will be nothing to suggest.
|
||||||
|
|
||||||
2. Now, use `printAllWords()` considering the last node(after traversal of trie according to `s`) as a root.
|
2. Now, use `printAllWords()` considering the last node(after traversal of trie according to `s`) as a root.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
void autocomplete(trie_node* root, string s)
|
void autocomplete(trie_node* root, string s)
|
||||||
{
|
{
|
||||||
int n = s.size();
|
int n = s.size();
|
||||||
trie_node* temp = root;
|
trie_node* temp = root;
|
||||||
for(int i = 0; i < n; i++) {
|
for(int i = 0; i < n; i++) {
|
||||||
if(temp->link[s[i]-'a'] == nullptr)
|
if(temp->link[s[i]-'a'] == nullptr)
|
||||||
return;
|
return;
|
||||||
temp = temp->link[s[i]-'a'];
|
temp = temp->link[s[i]-'a'];
|
||||||
}
|
}
|
||||||
vector<string> suggest;
|
vector<string> suggest;
|
||||||
printWords(temp, suggest, s);
|
printWords(temp, suggest, s);
|
||||||
for(auto i:suggest)
|
for(auto i:suggest)
|
||||||
cout << i << endl;
|
cout << i << endl;
|
||||||
/*
|
/*
|
||||||
OR
|
OR
|
||||||
printWords(temp, suggest);
|
printWords(temp, suggest);
|
||||||
for(auto i:suggest)
|
for(auto i:suggest)
|
||||||
cout << s << i << endl;
|
cout << s << i << endl;
|
||||||
*/
|
*/
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
**Time complexity:** $O(\text{Length of S + Total length of all suggestions excluding common prefix(S) from all})$, where `s` is the string you want suggestions for.
|
**Time complexity:** $O(\text{Length of S + Total length of all suggestions excluding common prefix(S) from all})$, where `s` is the string you want suggestions for. <br>
|
||||||
**Space complexity:** $O(\text{Total length of all possible suggestions})$
|
**Space complexity:** $O(\text{Total length of all possible suggestions})$
|
||||||
|
|
||||||
It is widely used feature, as discussed at the start of the article.
|
It is a widely used feature, as discussed at the start of the article.
|
||||||
|
|
||||||
There is also something called **"Ternary Search Tree"**. When each node in the trie has most of its links used(having many similar prefixe words), trie is substantially more space efficient and time efficient than ternary search tree.
|
There is also something called **"Ternary Search Tree"**. When each node in the trie has most of its links used(having many similar prefix words), a trie is substantially more space-efficient and time-efficient than the ternary search tree.
|
||||||
|
|
||||||
But, If each node stores few links, then ternary search tree is much more space efficient, because we are using $26$ pointers in each node of trie and many of them may be unused.
|
But, If each node stores a few links, then the ternary search tree is much more space-efficient, because we are using $26$ pointers in each node of trie and many of them may be unused.
|
||||||
|
|
||||||
Therefore, use as per the requirements.
|
Therefore, use as per the requirements.
|
||||||
|
|
||||||
## Dictionary using Trie
|
## Dictionary using Trie
|
||||||
|
|
||||||
What are common features of an english dictionary?
|
What are the common features of an English dictionary?
|
||||||
|
|
||||||
1. Efficient Lookup of words
|
1. Efficient Lookup of words
|
||||||
2. As dictionary is very large, Less memory usages
|
2. As the dictionary is very large, Lesser memory usages
|
||||||
|
|
||||||
Hashtable can be used to implement dictionary. After precomputation of hash for each word in $O(M)$, where $M$ is total length of all words in the dictionary, we can have efficient lookups if we design a very efficient hashtable.
|
Hashtable can be used to implement a dictionary. After precomputation of hash for each word in $O(M)$, where $M$ is the total length of all words in the dictionary, we can have efficient lookups if we design a very efficient hashtable.
|
||||||
|
|
||||||
But as dictionary is very large there will be collisions between two or more words. But still you can design hash table to have efficient look-ups.
|
But as the dictionary is very large there will be collisions between two or more words. Still, you can design a hash table to have efficient look-ups.
|
||||||
|
|
||||||
But space usages is very high, as we simply store each words. But what if we design it using a trie?
|
But space usages is very high, as we simply store each word. But what if we design it using a trie?
|
||||||
|
|
||||||
As in a dictionary we have many common-prefix words, trie will save substantial amount of memory consumption. Trie supports look-up in $O(word length)$, which is higher than a very efficient hash table.
|
As in a dictionary we have many common-prefix words, trie will save a substantial amount of memory consumption. Trie supports look-up in $O(\text{word length})$, which is higher than a very efficient hash table.
|
||||||
|
|
||||||
Other advantages of trie is as below:
|
Other advantages of the trie are as below:
|
||||||
1. Auto-complete feature
|
1. Auto-complete feature
|
||||||
2. It also supports ordered traversal of words with given prefix
|
2. It also supports ordered traversal of words with given prefix
|
||||||
3. No need for complex hash functions
|
3. No need for complex hash functions
|
||||||
|
|
||||||
So, if you want some of the above features then using trie is good for you. Also, we don't have to deal with collisions.
|
So, if you want some of the above features then using trie is good for you. Also, we don't have to deal with collisions.
|
||||||
|
|
||||||
Note that in dictionary along with a word, we have explanations or meanings of that word. That can be handled by seperately maintaining an array which stores all those extra stuffs. Then store one integer in the `TrieNode` structure to store the index of the corresponding data in the array.
|
Note that in the dictionary along with a word, we have explanations or meanings of that word. That can be handled by separately maintaining an array that stores all those extra stuff. Then store one integer in the `TrieNode` structure to store the index of the corresponding data in the array.
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
struct trie_node
|
struct trie_node
|
||||||
{
|
{
|
||||||
// Array of pointers of type
|
// Array of pointers of type
|
||||||
// trie_node
|
// trie_node
|
||||||
vector<trie_node*> links;
|
vector<trie_node*> links;
|
||||||
bool isEndofString;
|
bool isEndofString;
|
||||||
// To store id of data
|
// To store id of data
|
||||||
int idOfData;
|
int idOfData;
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
Below image shows a typical trie structure for dictionary.
|
The below image shows a typical trie structure for the dictionary.
|
||||||
|
|
||||||
|

|
||||||
.jpg)
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user