mirror of
https://github.com/dholerobin/Lecture_Notes.git
synced 2025-03-15 21:59:56 +00:00
Update Z-algorithm.md
This commit is contained in:
parent
d00010175d
commit
def62f767a
@ -1,11 +1,15 @@
|
||||
Z-function and z-algorithm
|
||||
|
||||
Z-algorithm is a **string-matching algorithm**, which is used to find a place where a string is found within a larger string. It uses the value of **z-function** for a given string.
|
||||
|
||||
Let's first see what is a **z-function**.
|
||||
|
||||
# Z-Algorithm
|
||||
Z-function for a given string $s$ of length $n$ is an array of length $n$, where $z[i]$ represents length of longest common prefix of string $s$ and suffix of s starting at $i$ i.e. $s[i,n-1]$.
|
||||
Z-function for a given string $s$ of length $n$ returns an array $z$ of length $n$, where $z[i]$ represents the length of the longest common prefix of string $s$(i.e. $s[0,n-1]$) and suffix of $s$ starting at $i$ i.e. $s[i,n-1]$.
|
||||
|
||||
**Note:** $s[l,r]$ represents substring of $S$ starting at index $l$ and ending at index $r$. Here, we are taking zero based indices.
|
||||
|
||||
Note that value of $z[0]$ is not properly defined so we take it as zero($0$).
|
||||
Note that the value of $z[0]$ is not properly defined so we take it as zero($0$).
|
||||
|
||||
For example,
|
||||
1. $z("cccc") = [0,3,2,1]$
|
||||
@ -18,18 +22,18 @@ For example,
|
||||
Why z[4] = 3?
|
||||
Because $s[0,2] = s[4,6] = "aba"$.
|
||||
|
||||
Can you figure out how do we find value of z-function?
|
||||
Can you figure out how do we find the value of z-function?
|
||||
|
||||
## Trivial Algorithm
|
||||
|
||||
Basic way to find value of z-function is to do brute force. For index - $i$, we find it following way.
|
||||
The basic way to find the value of z-function is to do brute force. For index - $i$, we find it following way.
|
||||
```
|
||||
z[i] = 0;
|
||||
while(i + z[i] < n && s[z[i]] == s[i + z[i]])
|
||||
z[i]++;
|
||||
```
|
||||
|
||||
Simply, do this for every indices.
|
||||
Simply, do this for every index.
|
||||
|
||||
```cpp
|
||||
vector<int> z_function(string s) {
|
||||
@ -60,6 +64,8 @@ We can see that $s[i,r]$ and $s[i-l,r-l]$ are equal. Now, look at $z[i-l]$ and t
|
||||
|
||||
$z[i-l]$ tells us that $s[0,z[i-l]-1]$ and $s[i-l,i-l+z[i-l]-1]$ are equal and therefore $s[0,z[i-l]-1]$ and $s[i,i+z[i-l]-1]$ are equal, which means that $z[i]=z[i-l]$.
|
||||
|
||||
Confused? Go through the series of images below that will make the whole thing clear.
|
||||
|
||||

|
||||
|
||||

|
||||
@ -79,11 +85,11 @@ while(i + z[i] < n && s[z[i]] == s[i + z[i]])
|
||||
z[i]++;
|
||||
```
|
||||
|
||||
After that if $i+z[i]$ is going beyond $r$, then we simply update indices $[l,r]$ to maintain **rightmost segment match** to take advantage of previous values as much as possible for next indices as well.
|
||||
After that if $i+z[i]$ is going beyond $r$, then we simply update indices $[l,r]$ as $l = i$ and $r = i + z[i]$, to maintain the **rightmost segment match** to take the advantage of previous values as much as possible for next indices as well.
|
||||
|
||||
**Note that initially $[l,r]$ segment is taken as $[0,0]$**. So, we basically start by doing brute force, or generally for an index $i$,
|
||||
**Note that initially $[l,r]$ segment is taken as $[0,0]$**. So, we start by doing brute force, or generally for an index $i$,
|
||||
|
||||
1. If $i<=r$, then we wiil take advantage of previous value and then do brute force.
|
||||
1. If $i<=r$, then we will take advantage of the previous value and then do brute force.
|
||||
2. Else if $i>r$, we directly do brute force as we can't take advantage of any previous value.
|
||||
|
||||
```cpp
|
||||
@ -112,7 +118,7 @@ vector<int> z_function(string s) {
|
||||
|
||||
### Time complexity
|
||||
|
||||
$O(N)$, as at each step of the algorithm $r$ at least increases one step and maximum possible value of r is $n-1$.
|
||||
$O(N)$, as at each step of the algorithm $r$ at least increases one step, and the maximum possible value of r is $n-1$.
|
||||
|
||||
## Search for a string
|
||||
|
||||
@ -122,7 +128,7 @@ For example, `p = "ab"` and `s = "abbbabab"`, then Z-algorithm will find us `[0,
|
||||
|
||||
Basic idea here is to create a new string having $p$ as a prefix and $s$ as a suffix i.e. `new_str = p + '#' + s`.
|
||||
|
||||
**To make sure that the value of Z-function does not exceed length of $p$, we will add an additional character which is never going to appear in string $s$**.
|
||||
**To make sure that the value of Z-function does not exceed the length of $p$, we will add character which is never going to appear in string $s$**.
|
||||
|
||||
Now, we will find Z-function of `new_str`.
|
||||
|
||||
@ -162,21 +168,21 @@ int main()
|
||||
```
|
||||
|
||||
|
||||
## To find period of string
|
||||
## To find the period of a string
|
||||
|
||||
Period of string is the shortest length such that a larger string $s$ can be represented as a concatenation of one or more copies of a substring($t$).
|
||||
Period of a string is the shortest length such that a larger string $s$ can be represented as a concatenation of one or more copies of a substring($t$).
|
||||
|
||||
For example, `s = "ababab"` has a period of $2$, where `t = "ab"`.
|
||||
|
||||
Let's see how to find period of $s$ using value of z-function of $s$.
|
||||
Let's see how to find the period of $s$ using the value of z-function of $s$.
|
||||
|
||||
**First of all note that length of string $s$($n$) is divisible by period of string.** Therefore, we can divide string $s$ into multiple blocks of same length as period of $s$.
|
||||
**First of all note that the length of string $s$($n$) is divisible by the period of string.** Therefore, we can divide string $s$ into multiple blocks of the same length as a period of $s$.
|
||||
|
||||
First of all, we will find all divisors of $n$ and value of z-function of $s$. Now, we will need to find smallest divisor of $n$ for which $i+z[i] = n$, which is period of string $s$. Why?
|
||||
|
||||
$z[i]$ represents length of the longest common prefix of $s[0,n-1]$ and $s[i,n-1]$. As $i$ is divisor of $n$, we can divide the whole string into blocks of length $i$.
|
||||
|
||||
From the value of $z[i] = n-i$($\because i+z[i]=n$), we can see that the first block($s[0,i-1]$) is equal to the second block starting at $i$-$s[i,i+i-1]$, which is also equal to third block $s[2*i,3*i-1]$ and similarly all blocks turns out to be equal.
|
||||
From the value of $z[i] = n-i$($\because i+z[i]=n$), we can see that the first block($s[0,i-1]$) is equal to the second block starting at $i$ i.e. $s[i,i+i-1]$, which is also equal to third block $s[2*i,3*i-1]$ and similarly all blocks turns out to be equal.
|
||||
|
||||
Therefore, smallest $i$ such that $n\% i=0$ and $i+z[i]=n$, is period of string $s$. If there is no such $i$, then string is not periodic as we cannot divide string into equivalent blocks.
|
||||
|
||||
@ -225,7 +231,7 @@ int main()
|
||||
|
||||
Now, we know how to find a period of a string and therefore we can compress string as only one block of size $i$ which repeats all over again and again in $s$.
|
||||
|
||||
To retrive the string back from compressed version, we can attatch its real length i.e. length of $s$.
|
||||
To retrieve the string back from the compressed version, we can attach its real length i.e. length of $s$.
|
||||
|
||||
```cpp
|
||||
int main()
|
||||
@ -246,7 +252,7 @@ int main()
|
||||
}
|
||||
|
||||
if(period != 0) {
|
||||
// A way to represent compressed string
|
||||
// A way to represent a compressed string
|
||||
// Attatch real length of string to retrieve easily
|
||||
pair<string, int> compressed_str{s.substr(0,period), n};
|
||||
}
|
||||
@ -258,22 +264,21 @@ int main()
|
||||
}
|
||||
```
|
||||
|
||||
## Number of distinct substrings in a string
|
||||
|
||||
## Number of distinct substring in a string
|
||||
**Problem statement:** Find the number of unique substrings in a given string $s$.
|
||||
|
||||
**Problem statement:** Find number of unique substrings in a given string $s$.
|
||||
**Brief idea:** Basic idea here is to take an empty string $t$ and add characters one by one from string $s$ and along with that check how many new substrings are created, due to the addition of a character in $t$, using z-function.
|
||||
|
||||
**Brief idea:** Basic idea here is to take an empty string $t$ and add characters one by one from string $s$ and along with that check how many new substrings are created, due to addition of a character in $t$, using z-function.
|
||||
|
||||
Let say we have already added some characters to $t$ from $s$ and $k$ is the number of distinct substrings currently. Now, we are a adding character $c$ to $t$, $t = t+c$.
|
||||
Let say we have already added some characters to $t$ from $s$ and $k$ is the number of distinct substrings currently. Now, we are adding a character $c$ to $t$, $t = t+c$.
|
||||
|
||||
Note that total number of new substrings created by appending a character to any string($t$) is equal to the length of new string($t=t+c$) created. **For example, Appending `'d'` in `"abc"` creates 4 new substrings: `"d"`, `"cd"`, `"bcd"`, `"abcd"`.**
|
||||
|
||||
But how to find number of new unique substrings created by addition of $c$ **using z-function**?
|
||||
But how to find the number of new unique substrings created by the addition of $c$ **using z-function**?
|
||||
|
||||
**Hint:** Reverse $t$.
|
||||
|
||||
By reversing $t$, our task burn down into computing how many prefixes there are that don't appear anywhere else in $t$, which can be done by finding z-function of $t$.
|
||||
By reversing $t$, our task burns down into computing how many prefixes there are that don't appear anywhere else in $t$, which can be done by finding the z-function of $t$.
|
||||
|
||||
After finding value of z-function, we will find maximum value $z_{max}$($z_{max} = max\{z[i]\}, \forall i$) in the z-function of reversed $t$, which shows the length of longest prefix which is already in $t$ as a substring and it also implies that all smaller prefixes are already present as substrings in $t$.
|
||||
|
||||
@ -281,7 +286,7 @@ Therefore, we will deduct this number of already present substrings i.e. $z_{max
|
||||
|
||||
Where $|t|$ is the length of $t$.
|
||||
|
||||
Finally, number of new unique substrings created by addition of a character turns out to be $|t|-z_{max}$.
|
||||
Finally, the number of new unique substrings created by the addition of a character turns out to be $|t|-z_{max}$.
|
||||
|
||||
**Note that $|t|$ is the length of $t$ after adding a character.**
|
||||
|
||||
@ -335,3 +340,6 @@ int main()
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
**Complexity**: $O(N^2)$, where $N$ is the length of $s$.
|
||||
|
||||
For each character appended, we are computing z-function in $O(N)$, which gives a time complexity of $O(N^2)$ in total.
|
||||
|
Loading…
x
Reference in New Issue
Block a user