Lecture_Notes/articles/Akash Articles/md/Z-algorithm.md

Z-function and z-algorithm

Z-algorithm is a **string-matching algorithm**, which is used to find a place where a string is found within a larger string. For example, `p = "ab"` and `s = "abbbabab"`, then KMP will find us `[0,4,6]` because $s$ has 3 occurrences of `"ab"`. It uses the value of **z-function** to do so.

Let's first see what is a **z-function**.

# Z-Algorithm
Z-function for a given string $s$ of length $n$ returns an array $z$ of length $n$, where $z[i]$ represents the length of the longest common prefix of string $s$(i.e. $s[0,n-1]$) and suffix of $s$ starting at $i$ i.e. $s[i,n-1]$.

**Note:** $s[l,r]$ represents substring of $S$ starting at index $l$ and ending at index $r$. Here, we are taking zero based indices.

Note that the value of $z[0]$ is not properly defined so we take it as zero($0$).

For example, 
1. $z("cccc") = [0,3,2,1]$
    Why $z[1]=3$? 
    Because $s[0,2] = s[1,3] = "ccc"$.
2. $z("ababab")=[0,0,4,0,2,0]$
    Why $z[2] = 4$? 
    Because $s[0,3] = s[2,5] = "abab"$.
3. $z("abacaba") = [0,0,1,0,3,0,1]$
    Why $z[4] = 3$?
    Because $s[0,2] = s[4,6] = "aba"$.

Can you figure out how do we find the value of z-function?

## Trivial Algorithm

The basic way to find the value of z-function is to do brute force. For index - $i$, we find it following way.
```
z[i] = 0;
while(i + z[i] < n && s[z[i]] == s[i + z[i]])
    z[i]++;
```

Simply, do this for every index.

```cpp
vector<int> z_function(string s) {
    int n = (int) s.size();
    vector<int> z(n);
    for (int i = 1; i < n; ++i)
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            z[i]++;
    return z;
}
```

## Efficient Algorithm

Now, we will take advantage of previously computed values as much as possible.

**Note:** $s[l,r]$ represents substring of $s$ starting at index $l$ and ending at index $r$.

Suppose we are given two indices $l$ and $r$ and also we are informed that $s[0,r-l]$ and $s[l,r]$ are equal. And we are finding value of z[i] such that $l<=i <= r$.

![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/1.jpg)

How can we take advantage of that information to find $z[i]$?

![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/2.jpg)

We can see that $s[i,r]$ and $s[i-l,r-l]$ are equal. Now, look at $z[i-l]$ and think how can we take advantage of it to find $z[i]$?

$z[i-l]$ tells us that $s[0,z[i-l]-1]$ and $s[i-l,i-l+z[i-l]-1]$ are equal and therefore $s[0,z[i-l]-1]$ and $s[i,i+z[i-l]-1]$ are equal, which means that $z[i]=z[i-l]$.

Confused? Go through the series of images below that will make the whole thing clear.

![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/3.jpg)

![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/4.jpg)

![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/5.jpg)

**But if $i+z[i-l]-1>r$, then it is ambiguous as we don't know anything about characters beyond $r$.**

And therefore we simply take $z[i]=min(z[i-l],r-i+1)$, which does not go beyond $r$.

Now, we will run brute force algorithm:

```
// As per the discussion
z[i] = min(z[i-l],r-i+1);
while(i + z[i] < n && s[z[i]] == s[i + z[i]])
    z[i]++;
``` 

After that if $i+z[i]$ is going beyond $r$, then we simply update indices $[l,r]$ as $l = i$ and $r = i + z[i]$, to maintain the **rightmost segment match** to take the advantage of previous values as much as possible for next indices as well.

**Note that initially $[l,r]$ segment is taken as $[0,0]$**. So, we start by doing brute force, or generally for an index $i$,

1. If $i<=r$, then we will take advantage of the previous value and then do brute force.
2. Else if $i>r$, we directly do brute force as we can't take advantage of any previous value.

```cpp
vector<int> z_function(string s) {
    int n = (int) s.size();
    vector<int> z(n);
    int l = 0, r = 0;
    for (int i = 1; i < n; ++i) {
        // Take advantage of previous value
        if (i <= r)
            z[i] = min (r - i + 1, z[i - l]);
        
        // Now do it usual brute-force way
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            ++z[i];
        
        // Set new range [l,r]
        if (i + z[i] - 1 > r) {
            l = i;
            r = i + z[i] - 1;
        }
    }
    return z;
}
```

### Time complexity

$O(N)$, as at each step of the algorithm $r$ at least increases one step, and the maximum possible value of r is $n-1$.

## Z-algorithm

Z-algorithm is used to search all occurrences of pattern-string $p$ in a string $s$ in $O(N)$.

For example, `p = "ab"` and `s = "abbbabab"`, then Z-algorithm will find us `[0,4,6]` because $s$ has 3 occurrences of `"ab"`.

Basic idea here is to create a new string having $p$ as a prefix and $s$ as a suffix i.e. `new_str = p + '#' + s`.

**To make sure that the value of Z-function does not exceed the length of $p$, we add a character that is never going to appear in string $s$ like `'#'`**.

Now, we will find Z-function of `new_str`.

Let say $m$ is the length of $p$.

$Z[i] = m$, means that `new_str[0..m-1]` is equal to `new_str[i...i+m-1]`, which is bacially means $p$(=`new_str[0...m-1]`) is equal to `new_str[i...i+m-1]`.

And therefore **all indices-$i$ where the values of Z-function $Z[i]$ equals to the length of $p$ means it is an occurrence of $p$ in $s$.**

```cpp
int main()
{
    string s,p;
    s = "abbbabab";
    p = "ab";
    int n = s.size(), m = p.size();

    // To save memory concatenate
    // s in p
    p += "#";
    p += s;
    // p = "ab#abbbabab";
    vector<int> z = z_function(p);

    // p = "ab#abbbabab";
    //         ^
    //        m+1
    cout << "occurences in s at the following indices: ";
    for(int i = m + 1; i < z.size(); i++) {
        if(z[i] == m) {
            cout << i - m - 1 << " ";
        }
    }
    
    return 0;
}
```

Z-function can also be used to solve various string related problems. Let's see some applications.

## To find the period of a string

Period of a string is the shortest length such that a larger string $s$ can be represented as a concatenation of one or more copies of a substring($t$).

For example, `s = "ababab"` has a period of $2$, where `t = "ab"`.

Let's see how to find the period of $s$ using the value of z-function of $s$.

**First of all note that the length of string $s$($n$) is divisible by the period of string.** Therefore, we can divide string $s$ into multiple blocks of the same length as a period of $s$.

First of all, we will find all divisors of $n$ and value of z-function of $s$. Now, we will need to find smallest divisor of $n$ for which $i+z[i] = n$, which is period of string $s$. Why?

$z[i]$ represents length of the longest common prefix of $s[0,n-1]$ and $s[i,n-1]$. As $i$ is divisor of $n$, we can divide the whole string into blocks of length $i$.

From the value of $z[i] = n-i$($\because i+z[i]=n$), we can see that the first block($s[0,i-1]$) is equal to the second block starting at $i$ i.e. $s[i,i+i-1]$, which is also equal to third block $s[2*i,3*i-1]$ and similarly all blocks turns out to be equal.

Therefore, smallest $i$ such that $n\% i=0$ and $i+z[i]=n$, is period of string $s$. If there is no such $i$, then string is not periodic as we cannot divide string into equivalent blocks.

```cpp
vector<int> getDivisors(int n)
{
    vector<int> v; 
    for (int i=1; i<=sqrt(n); i++)
        if (n%i==0) 
        {
            v.push_back(i);
            if (n != i*i)
                v.push_back(n/i);
        }
    return v;
}

int main()
{
    string s,p;
    s = "abcabcabc";
    int n = (int) s.size();
    vector<int> divs = getDivisors(n);
    sort(divs.begin(),divs.end());
    
    vector<int> z = z_function(s);
    int period = 0;
    for(auto i:divs) {
        if(i < n && z[i] + i == n) {
            period = i;
            break;
        }
    }
    
    if(period)
        cout << period << endl;
    else
        cout << "String is not periodic" << endl;
    
    return 0;
}

```

### String compression

Now, we know how to find a period of a string and therefore we can compress string as only one block of size $i$ which repeats all over again and again in $s$.

To retrieve the string back from the compressed version, we can attach its real length i.e. length of $s$.

```cpp
int main()
{
    string s,p;
    s = "abcabcabc";
    int n = (int) s.size();
    vector<int> divs = getDivisors(n);
    sort(divs.begin(),divs.end());
    
    vector<int> z = z_function(s);
    int period = 0;
    for(auto i:divs) {
        if(i < n && z[i] + i == n) {
            period = i;
            break;
        }
    }
    
    if(period != 0) {
        // A way to represent a compressed string
        // Attatch real length of string to retrieve easily
        pair<string, int> compressed_str{s.substr(0,period), n};
    }
    else {
        cout << "can't be compressed by this method" << endl;
    }
    
    return 0;
}
```

## Number of distinct substrings in a string

**Problem statement:** Find the number of unique substrings in a given string $s$.

**Brief idea:** Basic idea here is to take an empty string $t$ and add characters one by one from string $s$ and along with that check how many new substrings are created, due to the addition of a character in $t$, using z-function.

Let say we have already added some characters to $t$ from $s$ and $k$ is the number of distinct substrings currently. Now, we are adding a character $c$ to $t$, $t = t+c$. 

Note that total number of new substrings created by appending a character to any string($t$) is equal to the length of new string($t=t+c$) created. **For example, Appending `'d'` in `"abc"` creates 4 new substrings: `"d"`, `"cd"`, `"bcd"`, `"abcd"`.**

But how to find the number of new unique substrings created by the addition of $c$ **using z-function**?

**Hint:** Reverse $t$.

By reversing $t$, our task burns down into computing how many prefixes there are that don't appear anywhere else in $t$, which can be done by finding the z-function of $t$.

After finding value of z-function, we will find maximum value $z_{max}$($z_{max} = max\{z[i]\}, \forall i$) in the z-function of reversed $t$, which shows the length of longest prefix which is already in $t$ as a substring and it also implies that all smaller prefixes are already present as substrings in $t$.

Therefore, we will deduct this number of already present substrings i.e. $z_{max}$, from the total number of new substrings i.e. $|t|$.

Where $|t|$ is the length of $t$.

Finally, the number of new unique substrings created by the addition of a character turns out to be $|t|-z_{max}$.

**Note that $|t|$ is the length of $t$ after adding a character.**

```cpp
// Returns maximum of z[i]
int z_function(string& s) {
    int n = (int) s.size();
    vector<int> z(n);
    int l = 0, r = 0;
    int mx = 0;
    for (int i = 1; i < n; ++i) {
        // Take advantage of previous value
        if (i <= r)
            z[i] = min (r - i + 1, z[i - l]);
        
        // Now do it usual brute-force way
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            ++z[i];
            
        mx = max(z[i], mx);
        
        // Set new range [l,r]
        if (i + z[i] - 1 > r) {
            l = i;
            r = i + z[i] - 1;
        }
    }
    return mx;
}

int main()
{
    string s,p;
    s = "abc";
    int n = s.size();
    
    string t, temp;
    int unique_substr = 0;
    
    for(int i=0; i < n; i++) {
        t += s[i];
        temp = t;
        reverse(temp.begin(), temp.end());
        // |t| - mx
        unique_substr += (int)t.size() - z_function(temp);
    }

    // Total number of unique substrings    
    cout << unique_substr << endl;
    
    return 0;
}
```
**Complexity**: $O(N^2)$, where $N$ is the length of $s$.

For each character appended, we are computing z-function in $O(N)$, which gives a time complexity of $O(N^2)$ in total.
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`Z-function and z-algorithm`

Update Z-algorithm.md 2020-06-12 22:17:35 +05:30			Z-algorithm is a string-matching algorithm, which is used to find a place where a string is found within a larger string. For example, `p = "ab"` and `s = "abbbabab"`, then KMP will find us `[0,4,6]` because $s$ has 3 occurrences of `"ab"`. It uses the value of z-function to do so.
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30
			`Let's first see what is a z-function.`

Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`# Z-Algorithm`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Z-function for a given string $s$ of length $n$ returns an array $z$ of length $n$, where $z[i]$ represents the length of the longest common prefix of string $s$(i.e. $s[0,n-1]$) and suffix of $s$ starting at $i$ i.e. $s[i,n-1]$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`Note: $s[l,r]$ represents substring of $S$ starting at index $l$ and ending at index $r$. Here, we are taking zero based indices.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Note that the value of $z[0]$ is not properly defined so we take it as zero($0$).`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`For example,`
			`1. $z("cccc") = [0,3,2,1]$`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Why $z[1]=3$?`
			`Because $s[0,2] = s[1,3] = "ccc"$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`2. $z("ababab")=[0,0,4,0,2,0]$`
Update Z-algorithm.md 2020-06-04 02:23:34 +05:30			`Why $z[2] = 4$?`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Because $s[0,3] = s[2,5] = "abab"$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`3. $z("abacaba") = [0,0,1,0,3,0,1]$`
Update Z-algorithm.md 2020-06-04 02:23:34 +05:30			`Why $z[4] = 3$?`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Because $s[0,2] = s[4,6] = "aba"$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Can you figure out how do we find the value of z-function?`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`## Trivial Algorithm`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`The basic way to find the value of z-function is to do brute force. For index - $i$, we find it following way.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			```
			`z[i] = 0;`
			`while(i + z[i] < n && s[z[i]] == s[i + z[i]])`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`z[i]++;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			```

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Simply, do this for every index.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			```cpp
			`vector<int> z_function(string s) {`
			`int n = (int) s.size();`
			`vector<int> z(n);`
			`for (int i = 1; i < n; ++i)`
			`while (i + z[i] < n && s[z[i]] == s[i + z[i]])`
			`z[i]++;`
			`return z;`
			`}`
			```

			`## Efficient Algorithm`

			`Now, we will take advantage of previously computed values as much as possible.`

			`Note: $s[l,r]$ represents substring of $s$ starting at index $l$ and ending at index $r$.`

			`Suppose we are given two indices $l$ and $r$ and also we are informed that $s[0,r-l]$ and $s[l,r]$ are equal. And we are finding value of z[i] such that $l<=i <= r$.`

			`![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/1.jpg)`

			`How can we take advantage of that information to find $z[i]$?`

			`![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/2.jpg)`

			`We can see that $s[i,r]$ and $s[i-l,r-l]$ are equal. Now, look at $z[i-l]$ and think how can we take advantage of it to find $z[i]$?`

			`$z[i-l]$ tells us that $s[0,z[i-l]-1]$ and $s[i-l,i-l+z[i-l]-1]$ are equal and therefore $s[0,z[i-l]-1]$ and $s[i,i+z[i-l]-1]$ are equal, which means that $z[i]=z[i-l]$.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Confused? Go through the series of images below that will make the whole thing clear.`

Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/3.jpg)`

			`![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/4.jpg)`

			`![enter image description here](https://github.com/KingsGambitLab/Lecture_Notes/blob/master/articles/Akash%20Articles/md/Images/Z-algorithm/5.jpg)`

			`But if $i+z[i-l]-1>r$, then it is ambiguous as we don't know anything about characters beyond $r$.`

			`And therefore we simply take $z[i]=min(z[i-l],r-i+1)$, which does not go beyond $r$.`

			`Now, we will run brute force algorithm:`

			```
			`// As per the discussion`
			`z[i] = min(z[i-l],r-i+1);`
			`while(i + z[i] < n && s[z[i]] == s[i + z[i]])`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`z[i]++;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			```

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`After that if $i+z[i]$ is going beyond $r$, then we simply update indices $[l,r]$ as $l = i$ and $r = i + z[i]$, to maintain the rightmost segment match to take the advantage of previous values as much as possible for next indices as well.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Note that initially $[l,r]$ segment is taken as $[0,0]$. So, we start by doing brute force, or generally for an index $i$,`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`1. If $i<=r$, then we will take advantage of the previous value and then do brute force.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`2. Else if $i>r$, we directly do brute force as we can't take advantage of any previous value.`

			```cpp
			`vector<int> z_function(string s) {`
			`int n = (int) s.size();`
			`vector<int> z(n);`
			`int l = 0, r = 0;`
			`for (int i = 1; i < n; ++i) {`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`// Take advantage of previous value`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`if (i <= r)`
			`z[i] = min (r - i + 1, z[i - l]);`

			`// Now do it usual brute-force way`
			`while (i + z[i] < n && s[z[i]] == s[i + z[i]])`
			`++z[i];`

			`// Set new range [l,r]`
			`if (i + z[i] - 1 > r) {`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`l = i;`
			`r = i + z[i] - 1;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			`}`
			`return z;`
			`}`
			```

			`### Time complexity`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`$O(N)$, as at each step of the algorithm $r$ at least increases one step, and the maximum possible value of r is $n-1$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-12 22:17:35 +05:30			`## Z-algorithm`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`Z-algorithm is used to search all occurrences of pattern-string $p$ in a string $s$ in $O(N)$.`

			For example, `p = "ab"` and `s = "abbbabab"`, then Z-algorithm will find us `[0,4,6]` because $s$ has 3 occurrences of `"ab"`.

			Basic idea here is to create a new string having $p$ as a prefix and $s$ as a suffix i.e. `new_str = p + '#' + s`.

Update Z-algorithm.md 2020-06-04 02:54:13 +05:30			To make sure that the value of Z-function does not exceed the length of $p$, we add a character that is never going to appear in string $s$ like `'#'`.
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			Now, we will find Z-function of `new_str`.

			`Let say $m$ is the length of $p$.`

			$Z[i] = m$, means that `new_str[0..m-1]` is equal to `new_str[i...i+m-1]`, which is bacially means $p$(=`new_str[0...m-1]`) is equal to `new_str[i...i+m-1]`.

			`And therefore all indices-$i$ where the values of Z-function $Z[i]$ equals to the length of $p$ means it is an occurrence of $p$ in $s$.`

			```cpp
			`int main()`
			`{`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`string s,p;`
			`s = "abbbabab";`
			`p = "ab";`
			`int n = s.size(), m = p.size();`

			`// To save memory concatenate`
			`// s in p`
			`p += "#";`
			`p += s;`
			`// p = "ab#abbbabab";`
			`vector<int> z = z_function(p);`

			`// p = "ab#abbbabab";`
			`// ^`
			`// m+1`
			`cout << "occurences in s at the following indices: ";`
			`for(int i = m + 1; i < z.size(); i++) {`
			`if(z[i] == m) {`
			`cout << i - m - 1 << " ";`
			`}`
			`}`

			`return 0;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			```

Update Z-algorithm.md 2020-06-12 22:17:35 +05:30			`Z-function can also be used to solve various string related problems. Let's see some applications.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`## To find the period of a string`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Period of a string is the shortest length such that a larger string $s$ can be represented as a concatenation of one or more copies of a substring($t$).`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			For example, `s = "ababab"` has a period of $2$, where `t = "ab"`.

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Let's see how to find the period of $s$ using the value of z-function of $s$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`First of all note that the length of string $s$($n$) is divisible by the period of string. Therefore, we can divide string $s$ into multiple blocks of the same length as a period of $s$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`First of all, we will find all divisors of $n$ and value of z-function of $s$. Now, we will need to find smallest divisor of $n$ for which $i+z[i] = n$, which is period of string $s$. Why?`

			`$z[i]$ represents length of the longest common prefix of $s[0,n-1]$ and $s[i,n-1]$. As $i$ is divisor of $n$, we can divide the whole string into blocks of length $i$.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`From the value of $z[i] = n-i$($\because i+z[i]=n$), we can see that the first block($s[0,i-1]$) is equal to the second block starting at $i$ i.e. $s[i,i+i-1]$, which is also equal to third block $s[2i,3i-1]$ and similarly all blocks turns out to be equal.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`Therefore, smallest $i$ such that $n\% i=0$ and $i+z[i]=n$, is period of string $s$. If there is no such $i$, then string is not periodic as we cannot divide string into equivalent blocks.`

			```cpp
			`vector<int> getDivisors(int n)`
			`{`
			`vector<int> v;`
			`for (int i=1; i<=sqrt(n); i++)`
			`if (n%i==0)`
			`{`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`v.push_back(i);`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`if (n != i*i)`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`v.push_back(n/i);`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			`return v;`
			`}`

			`int main()`
			`{`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`string s,p;`
			`s = "abcabcabc";`
			`int n = (int) s.size();`
			`vector<int> divs = getDivisors(n);`
			`sort(divs.begin(),divs.end());`

			`vector<int> z = z_function(s);`
			`int period = 0;`
			`for(auto i:divs) {`
			`if(i < n && z[i] + i == n) {`
			`period = i;`
			`break;`
			`}`
			`}`

			`if(period)`
			`cout << period << endl;`
			`else`
			`cout << "String is not periodic" << endl;`

			`return 0;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`

			```

			`### String compression`

			`Now, we know how to find a period of a string and therefore we can compress string as only one block of size $i$ which repeats all over again and again in $s$.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`To retrieve the string back from the compressed version, we can attach its real length i.e. length of $s$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			```cpp
			`int main()`
			`{`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`string s,p;`
			`s = "abcabcabc";`
			`int n = (int) s.size();`
			`vector<int> divs = getDivisors(n);`
			`sort(divs.begin(),divs.end());`

			`vector<int> z = z_function(s);`
			`int period = 0;`
			`for(auto i:divs) {`
			`if(i < n && z[i] + i == n) {`
			`period = i;`
			`break;`
			`}`
			`}`

			`if(period != 0) {`
			`// A way to represent a compressed string`
			`// Attatch real length of string to retrieve easily`
			`pair<string, int> compressed_str{s.substr(0,period), n};`
			`}`
			`else {`
			`cout << "can't be compressed by this method" << endl;`
			`}`

			`return 0;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			```

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`## Number of distinct substrings in a string`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Problem statement: Find the number of unique substrings in a given string $s$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Brief idea: Basic idea here is to take an empty string $t$ and add characters one by one from string $s$ and along with that check how many new substrings are created, due to the addition of a character in $t$, using z-function.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Let say we have already added some characters to $t$ from $s$ and $k$ is the number of distinct substrings currently. Now, we are adding a character $c$ to $t$, $t = t+c$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			Note that total number of new substrings created by appending a character to any string($t$) is equal to the length of new string($t=t+c$) created. For example, Appending `'d'` in `"abc"` creates 4 new substrings: `"d"`, `"cd"`, `"bcd"`, `"abcd"`.

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`But how to find the number of new unique substrings created by the addition of $c$ using z-function?`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`Hint: Reverse $t$.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`By reversing $t$, our task burns down into computing how many prefixes there are that don't appear anywhere else in $t$, which can be done by finding the z-function of $t$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`After finding value of z-function, we will find maximum value $z_{max}$($z_{max} = max\{z[i]\}, \forall i$) in the z-function of reversed $t$, which shows the length of longest prefix which is already in $t$ as a substring and it also implies that all smaller prefixes are already present as substrings in $t$.`

			`Therefore, we will deduct this number of already present substrings i.e. $z_{max}$, from the total number of new substrings i.e. $\|t\|$.`

			`Where $\|t\|$ is the length of $t$.`

Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Finally, the number of new unique substrings created by the addition of a character turns out to be $\|t\|-z_{max}$.`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30
			`Note that $\|t\|$ is the length of $t$ after adding a character.`

			```cpp
			`// Returns maximum of z[i]`
			`int z_function(string& s) {`
			`int n = (int) s.size();`
			`vector<int> z(n);`
			`int l = 0, r = 0;`
			`int mx = 0;`
			`for (int i = 1; i < n; ++i) {`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`// Take advantage of previous value`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`if (i <= r)`
			`z[i] = min (r - i + 1, z[i - l]);`

			`// Now do it usual brute-force way`
			`while (i + z[i] < n && s[z[i]] == s[i + z[i]])`
			`++z[i];`

			`mx = max(z[i], mx);`

			`// Set new range [l,r]`
			`if (i + z[i] - 1 > r) {`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`l = i;`
			`r = i + z[i] - 1;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			`}`
			`return mx;`
			`}`

			`int main()`
			`{`
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`string s,p;`
			`s = "abc";`
			`int n = s.size();`

			`string t, temp;`
			`int unique_substr = 0;`

			`for(int i=0; i < n; i++) {`
			`t += s[i];`
			`temp = t;`
			`reverse(temp.begin(), temp.end());`
			`// \|t\| - mx`
			`unique_substr += (int)t.size() - z_function(temp);`
			`}`

			`// Total number of unique substrings`
			`cout << unique_substr << endl;`

			`return 0;`
Create Z-algorithm.md 2020-06-03 22:53:47 +05:30			`}`
			```
Update Z-algorithm.md 2020-06-04 02:22:16 +05:30			`Complexity: $O(N^2)$, where $N$ is the length of $s$.`

			`For each character appended, we are computing z-function in $O(N)$, which gives a time complexity of $O(N^2)$ in total.`