reorganize files

This commit is contained in:
Pragy Agarwal
2020-03-09 16:59:01 +05:30
parent f0ab114f3e
commit 2e0decd12a
21 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,442 @@
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
## Bellman-Ford Algorithm
Dijkstra's algorithm solves the problem of finding single source shortest paths, but it does not work in the case if there are negative edges or negative cycles in the graph.
How do we solve the SSSP problem when the given graph has negative weighted edges?
We have a new algorithm named "**Bellman-Ford algorithm**" which achieves this goal.
## Quiz Time
You are given a directed graph, how many edges can be there on the path between any two nodes at maximum?
Answer: $|V|-1$
## Brute Force
DFS finds out the correct shortest path whether the edge weights are positive or negative, because it searches all the possible paths from the source to the destination.
**Note:** If there is a negative cycle, then it is not possible to find the shortest distance to affected vertices.
## Bellman-Ford algorithm
Bellman-Ford algorithm works on the principal that the path between any two vertices can contain at most $|V|$ vertices or $|V|-1$ edges.
In Bellman-Ford Algorithm, the main step is the relaxation of the edges.
The algorithm relaxes all the outgoing directed edges $|V| - 1$ times.
### Algorithm
1. Same as Dijkstra's Algorithm, mark all the distances to $\infty$ and assign all the parent vertices to some sentinel value (not used before).
2. Assign source to source distance as $0$ and start the algorithm.
3. Loop over all the directed edges.
4. If the relaxation condition below is satisfied, then relax those edges.
**Relaxation Condition:** For an edge from $A \to B$,
$$\text{Distance}[B] < \text{Distance}[A] + \text{EdgeWeight}[A, B]$$
5. Repeat the step 3, $|V|-1$ times.
This is a kind of bottom-up **Dynamic Programming**. After the $k$-th iteration of the outer loop the shortest paths having at most $k$ edges are found.
### Visualization
After the first iteration of the outer loop there will not be any more relaxations.
### Code
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
// Function to print the required path
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
// Object - Edge
struct edge{
// Edge from -> to
// having some weight
int from, to, weight;
edge(int a, int b, int w)
{
from = a;
to = b;
weight = w;
}
};
int main()
{
int no_vertices=5;
// Array of edges
vector<edge> edges;
// Distance and Parent vertex storing arrays
vector<int> distance(no_vertices+1, MAX_DIST), parent(no_vertices+1,-1);
// Edges
edges.push_back(edge(1,2,1));
edges.push_back(edge(2,3,1));
edges.push_back(edge(1,3,2));
edges.push_back(edge(2,4,-10));
edges.push_back(edge(4,3,4));
edges.push_back(edge(3,5,1));
// For the shake of example
int source = 1, destination = 5;
distance[1] = 0;
// Bellman-Ford Algorithm
for (int i = 0; i < no_vertices - 1; i++)
{
// Loop over all the edges
for(int j = 0; j < edges.size() ; j++)
{
if(distance[edges[j].from] != MAX_DIST) {
// Check for the Relaxation Condition
if(distance[edges[j].to] > distance[edges[j].from] + edges[j].weight )
{
distance[edges[j].to] = distance[edges[j].from] + edges[j].weight;
parent[edges[j].to] = edges[j].from;
}
}
}
}
// Shortest distance from source to destination
cout << distance[5] << endl;
// Shortest path
printpath(parent, 5, 1, 5);
return 0;
}
```
### Time Complexity
- $\mathcal{O}(|V| )$ time is taken by outer loop as it runs $|V|-1$ times.
- $\mathcal{O}(|E|)$ time to loop over all the edges in the inner loop.
So the total time complexity will be: $\mathcal{O}(| V | \cdot | E |)$
If there is a negative cycle in the graph, then certainly we can not find the shortest paths. But how to detect the negative cycles?
We can use Bellman-Ford algorithm to detect negative cycles in the graph. How?
## Detection of Negative Cycle
In the algorithm, we are running the outer loop $|V| - 1$times, as there can be at most $|V| - 1$ relaxations on a path between any two vertices.
Now, if there are any more relaxations possible, then there is a negative cycle in the graph. This is how we detect the negative cycle.
So, we will run the outer loop one more time to detect the negative cycle.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
// Function to print the required path
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
// Object of Edge
struct edge{
// Edge from -> to
// having some weight
int from, to, weight;
edge(int a, int b, int w)
{
from = a;
to = b;
weight = w;
}
};
// Bellman-Ford Algorithm for Negative Cycle
bool Bellman_Ford_NC(vector<edge> & edges, vector<int> & distance, vector<int> & parent)
{
int no_vertices = 5;
for (int i = 0; i < no_vertices - 1; i++)
{
// Loop over all the edges
for(int j = 0; j < edges.size() ; j++)
{
if(distance[edges[j].from] != MAX_DIST)
{
// Check for the Relaxation Condition
if(distance[edges[j].to] > distance[edges[j].from] + edges[j].weight )
{
distance[edges[j].to] = distance[edges[j].from] + edges[j].weight;
parent[edges[j].to] = edges[j].from;
}
}
}
}
bool is_negative_cycle = false;
// Running the outer loop one more time
for(int j = 0; j < edges.size() ; j++)
{
// Check for the Relaxation Condition
if(distance[edges[j].to] > distance[edges[j].from] + edges[j].weight )
{
// Used when finding vertices in NC
distance[edges[j].to] = distance[edges[j].from] + edges[j].weight;
parent[edges[j].to] = edges[j].from;
// There is a negative cycle
is_negative_cycle = true;
}
}
if(is_negative_cycle)
{
cout << "There is a negative cycle in the graph." << endl;
return false;
}
return true;
}
int main()
{
int no_vertices=5;
// Array of edges
vector<edge> edges;
// Distance and Parent vertex storing arrays
vector<int> distance(no_vertices+1, MAX_DIST), parent(no_vertices+1,-1);
// Edges
edges.push_back(edge(1,2,1));
edges.push_back(edge(2,3,5));
edges.push_back(edge(3,1,2));
edges.push_back(edge(2,4,-10));
edges.push_back(edge(4,3,4));
edges.push_back(edge(3,5,1));
// For the shake of example
int source = 1, destination = 5;
distance[1] = 0;
if(Bellman_Ford(edges, distance, parent))
{
// Shortest distance from source to destination
cout << distance[5] << endl;
// Shortest path
printpath(parent, 5, 1, 5);
}
return 0;
}
```
Is it possible to find the vertices involved in the negative cycle? Yes.
## Finding the Negative Cycle
If there is a unique negative cycle, then we can find out the vertices involved in the cycle using the data of the last relaxation edge and parent vertices.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
// Function to print the required path
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
// Function to print the required path
void printcycle(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printcycle(parent, parent[vertex], source, destination);
if(vertex == destination)
cout << vertex << "-->" << source << endl;
else
cout << vertex << "-->";
}
// Object - Edge
struct edge{
// Edge from -> to
// having some weight
int from, to, weight;
edge(int a, int b, int w)
{
from = a;
to = b;
weight = w;
}
};
// Bellman-Ford Algorithm
bool Bellman_Ford(vector<edge> & edges, vector<int> & distance, vector<int> & parent)
{
int no_vertices = 5;
for (int i = 0; i < no_vertices - 1; i++)
{
// Loop over all the edges
for(int j = 0; j < edges.size() ; j++)
{
if(distance[edges[j].from] != MAX_DIST)
{
// Check for the Relaxation Condition
if(distance[edges[j].to] > distance[edges[j].from] + edges[j].weight )
{
distance[edges[j].to] = distance[edges[j].from] + edges[j].weight;
parent[edges[j].to] = edges[j].from;
}
}
}
}
bool is_negative_cycle = false;
int last_relaxation = 0;
// Running the outer loop one more time
for(int j = 0; j < edges.size() ; j++)
{
// Check for the Relaxation Condition
if(distance[edges[j].to] > distance[edges[j].from] + edges[j].weight )
{
distance[edges[j].to] = distance[edges[j].from] + edges[j].weight;
parent[edges[j].to] = edges[j].from;
last_relaxation = edges[j].to;
is_negative_cycle = true;
}
}
if(is_negative_cycle)
{
cout << "There is a negative cycle in the graph." << endl;
return last_relaxation;
}
return 0;
}
int main()
{
int no_vertices=5;
// Array of edges
vector<edge> edges;
// Distance and Parent vertex storing arrays
vector<int> distance(no_vertices+1, MAX_DIST), parent(no_vertices+1,-1);
// Edges
edges.push_back(edge(1,2,1));
edges.push_back(edge(2,3,5));
edges.push_back(edge(3,1,2));
edges.push_back(edge(2,4,-10));
edges.push_back(edge(4,3,4));
edges.push_back(edge(3,5,1));
// For the shake of example
int source = 1, destination = 5;
distance[1] = 0;
int last_relaxation = Bellman_Ford(edges, distance, parent);
if(!last_relaxation)
{
// Shortest distance from source to destination
cout << distance[5] << endl;
// Shortest path
printpath(parent, 5, 1, 5);
}
else
{
int trapped = last_relaxation;
// To find the negative_cycle, we can
// use the last relaxation data
// and loop back over parent vertices
// for over no_vertices time, so that
// we get trapped in the negative cycle
for(int i = 0; i < no_vertices; i++)
{
trapped = parent[trapped];
}
// Printing negative_cycle
printcycle(parent, parent[trapped], trapped, parent[trapped]);
}
return 0;
}
```
## Other Shortest Path finding Algorithms
1. All pair shortest path algorithm - **Floyd Warshall Algorithm**.
2. SSSP using Dynamic Programming.
3. Dijkstra's Algorithm

334
Akash Articles/md/DSU.md Normal file
View File

@@ -0,0 +1,334 @@
Suppose, you are giving a programming contest and one of the problem is: You are given a number of vertices and a list of undirected unweighted edges between these vertices. Now the queries are to find whether there is a path from some vertex $u$ to $v$. Note that the whole graph may not be connected. How can you solve it?
![enter image description here](https://lh3.googleusercontent.com/H_pixss3v5apcsEkUmk_hAzOhgif-O43ce8IijOw3AhCmATXdw0QpG6eQCJEnmwcLs0NYUa96_XU)
DFS, Right? Start DFS from either u or v and check if we can reach to the other vertex. Done!
But what if the graph is **dynamic**, means that apart from the path query, you are given another type of query which is, to add an edge in the graph. Now, how to solve it?
![](https://lh3.googleusercontent.com/AK-Y9QBBXX0mB1twpZdPPTA2gcEhPjKwAh0cOxGaXltv6S2xcup9HPF2CDpjvhBlp3v4IiS341lz)
Again DFS? Yes, you can add any number of edges and still check if there is a path from vertex u and v. But if you do that then you will get **TLE**(Time limit exceed).
Now, Disjoint Set Union is a data structure which can do this operations very efficiently.
But what do we mean by the name **"Disjoint Set Union"**. **Set** is a collection of distinct elements. **Disjoint set** means they are non-overlapping - in language of math if A and B are two disjoint sets then $A \cap B = \phi$. **Union** is an operation, we do to combine two disjoint sets.
In the above stated problem, we can consider a connected components as disjoint sets and then we can do union when we are adding edges.
For the queries, to check if there is a path from u to v, we can check whether u and v are in the same disjoint sets, if yes then there is a path from u to v, otherwise not. Confusing?
Now, let's see how it works actually.
## Disjoint Set Union
Disjoint Set Union is one of the simplest and easy to implement data structure, which is used to keep track of disjoint(Non-overlapping) dynamic sets.
There are three main operations of this data structure: Make-set, Find and Union
1. **Make-Set**: This operation creates a disjoint set having a single element.
2. **Find**: This operation finds a unique set to which a particular element belongs.
3. **Union**: This operation unifies two disjoint sets.
There are many ways we can implement this data structure: Linked list, Array, Trees. But here we will implement it using array and represent using tree.
**Some Terminologies**
- **Parent** is a main attribute of an element(or set), which represents an element by which a particular element is connected with some disjoint set.
In the image below, $c$ is parent of $d$ and $a$ is parent of $c$.
**Note:** Below is just for visualization purpose, if you don't understand it right now. Don't worry, you will understand it by the end of the article.
![enter image description here](https://lh3.googleusercontent.com/iaVsRRMUzGUK-PlEl7gFtUoDnatg9O2tBF-gMJ_qm5FyNWJSWXnCL6jAxX5siijx1L57Tg-3A0HZ)
- **Root** is an element of a set whose parent is itself. It is unique per set.
$a$ is the root element for the disjoint set above in the image.
## Operation Make-Set
Make-Set operation creates a new set having a single element (means size=1) which is having a unique id.
**Pseudocode:**
```
MAKE-SET(x)
{
x.parent = x;
x.size = 1;
}
```
Here X is the only element in the set so it is parent of itself.
The image below represents sets generated by this operation. Where each one having arrow coming to itself, which represents that it is its own parent right now. Each one has size of 1.
![enter image description here](https://lh3.googleusercontent.com/UW-R9Hbi7YaCOyrVd2F0ThzzQ9pAF1zqoASJDhGKjbBHN8P-dJJr4sZubW1csc97l6iQMo3L39Bc)
We are working with arrays, so the code to make $n$ sets is as below:
```c++
vector<int> parent,size;
void Make_sets(int n)
{
parent.resize(n);
size.resize(n);
for(int i = 0; i < n; i++)
{
parent[i] = i;
size[i] = 1;
}
}
```
**Time Complexity:** Make-Set operation takes $O(1)$ time. So to creat $N$ sets it will take $O(N)$ time.
## Operation Find
$\text{Find}(X)$ basically finds the root element of the disjoint set to which $X$ belongs.
The root basically represents a unique ID for a particular disjoint set. (Look at the code for $\text{Find}(X)$)
If we apply $\text{Find}(d)$ or $\text{Find}(b)$ operation for the set in the image below, then it will return '$a$' which is a root element.
![enter image description here](https://lh3.googleusercontent.com/j1H9MBKoSzyQV_8ObjBOjD1W2Na57kYg8aGMrbI8dLepF2IIqbRJSKzccH7rgfWrBqgFJ3LtYzAN)
Here the thing to note is that, the root element of a root element of any disjoint set is itself i.e., $root.parent = root$
**Algorithm**
- Until you reach at the root element, traverse the tree of the disjoint set upwards.
**Pseudocode:**
```
FIND(X)
while x != x.parent
x = x.parent
return x;
```
**Visualization**
![enter image description here](https://lh3.googleusercontent.com/ex1uHjYzU0MXu0auog7GwQsAbawGVSROIvn0COglh33LHbHYFc8sTHnkn3Qgjb1FgJIaeLOz8Qig)
---------------
### Quiz Time
Can you find the recursive implementation of the above function?
Answer:
```
FIND(x)
if x == x.parent
return x
else
return FIND(x.parent)
```
------------------
**Implementation in C++**
```c++
// Iterative implementation
int Find(x)
{
while(x != parent[x])
x = parent[x];
return x;
}
// Recursive implementation
int Find(x)
{
if(x == parent[x])
return x;
else
return Find(parent[x]);
}
```
**Time Complexity:** This operation can take $O(N)$ in worst case where N is the size of the set-which can be number of total elements at maximum.
This is too much. Right? What else can we do?
We have a technique named **"Path compression"**. The idea of the Path compression is, **it re-connects every vertex to the root vertex directly, rather than by a path**.
If we apply $\text{Find}(d)$ operation with the path compression, then the following thing will happen.
![enter image description here](https://lh3.googleusercontent.com/ltQXkpZAjEO543ibrVodpMMZp2IHXVJ7Rjxevm2ztJQAC67UnvBeMmwEoIB9qZ0_2PgpSs98nWV9)
How can we do it? It is easy, we just need a little modification in $\text{Find}(X)$.
**Pseudocode:**
```
FIND(x)
if x == x.parent
return x
else
x.parent = FIND(x.parent);
return x.parent
```
So every time we run this function, it will re-connect every vertex on the path to the root, directly to the root.
---
### Quiz Time
Can you write the iterative version of the above $\text{FIND}(X)$ function with path compression?
Answer:
```
FIND(x)
y = x
while y != y.parent
y = y.parent
while x != x.parent
z = x.parent;
x.parent = y
x = z
```
----
**Implementation in C++**
```c++
// Iterative Implementation
int Find(x)
{
int y = x;
while(y != parent[y])
y = parent[y];
int parent;
while(x != parent[x])
{
parent = parent[x];
parent[x] = y;
x = parent;
}
return x;
}
// Recursive implementation
int Find(x)
{
if(x == parent[x])
return x;
else
return parent[x] = Find(parent[x]);
}
```
**Time complexity of Find:**
1. Without path compression: $\mathcal{O}(N)$
2. With path compression: $\mathcal{O}(\log^*(N))$
**Note:**
- $log^*(N)$ is the **iterated logarithm**, which is essentially the number of times we have to apply $log$ to $N$ before it becomes less than or equal to 1.
- $\mathcal{O}(\log^*(N))$ is almost constant time becuase $\log^*(N) \leqslant 5$ for even such a big number like $2^{65536}$.
## Operation Union
$\text{Union}(X,Y)$ operation first of all finds root element of both the disjoint sets containing X and Y respectively. Then it connects the root element of one of the disjoint set to the another.
Well, how do we decide which root will connet to which? If we do it randomly then it may increase the tree height up to $O(N)$, which means that the next $\text{Find}(x)$ operation will take $O(N)$ time. Can we do better?
Yes, we have two standard techniques: **By size and By rank**.
### By Size
Union by size technique decides it based on the sizes of the sets. Everytime, the smaller size set is attatched to the larger size set.
![enter image description here](https://lh3.googleusercontent.com/O9Q-Sbfm2LvjdbEgVoUwWSVfs4vA9MLxCgNGuzOWiyWgE_j9O2NOgTmOrhlZb5QMI_nPgG5lDfIo)
**Note:** The numbers in square bracket represents the size of the set below it.
**Pseudocode:**
```
UNION(X,Y)
Rx = FIND(X), Ry = FIND(Y)
if Rx == Ry
return
if Rx.size > Ry.size
Ry.parent = Rx
Rx.size = Rx.size + Ry.size
else
Rx.parent = Ry
Ry.size = Ry.size + Rx.size
```
### By Rank
In Union by rank technique, shorter tree is attatched to taller tree. Initally rank of each disjoint set is zero.
If both sets have same rank, then the resulting rank will be one greater. Otherwise the resulting rank will be larger of the two.
**Note:** In the image below, the numbers in square bracket represents the rank of the set below it.
Example 1:
![enter image description here](https://lh3.googleusercontent.com/6DjycKp_SUVdInkEVhT89_v_YcblSZAyHZaAxiAI60MN81f9ZNIZW2G0UjivAW2AIwDuH6z0EF0V)
Example 2:
![enter image description here](https://lh3.googleusercontent.com/_q-grBb90uOEu9v2EH1TXsHcHZh6QtyBRroxDiLch0vMdzLPvcvTd_YGoRa85fZYTMG_Qo_JDxOl)
**Pseudocode:**
```
UNION(X,Y)
Rx = FIND(X), Ry = FIND(Y)
if Rx == Ry
return
if Rx.rank > Ry.rank
Ry.parent = Rx
if Rx.rank == Ry.rank
Rx.rank = Rx.rank + 1
else
Rx.parent = Ry
if Rx.rank == Ry.rank
Ry.rank = Ry.rank + 1
```
**Implementation in c++**
```c++
// By size
void union(int x,int y)
{
int Rx = find(x), Ry = find(y);
if(Rx == Ry)
return;
if(size[Ry] > size[Rx])
swap(Rx,Ry);
parent[Ry] = Rx;
size[Rx] += size[Ry];
}
// By Rank
void union(int x,int y)
{
int Rx = find(x), Ry = find(y);
if(Rx == Ry)
return;
if(rank[Ry] > rank[Rx])
swap(Rx,Ry);
parent[Ry] = Rx;
if(rank[Rx] == rank[Ry])
rank[Rx] += 1;
}
```
### Time Complexity of Union
1. Without path compression(in find): $\mathcal{O}(N)$
2. With path compression: $\mathcal{O}(\log^*(N))$
## Applications of DSU
1. To keep track of connected components in an undirected graph.
2. In Kruskal's and Boruvka's algorithm to find minimum spanning tree.

View File

@@ -0,0 +1,374 @@
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
## Dijkstra's Algorithm
Have you ever used Google Maps?
Ever wondered how it works? How does it tell you the shortest path from point A to B?
At the backside of it, they are using something known as "Shortest Path Finding algorithm". Well, what does the shortest path actually mean?
The **dark** line represents the shortest path from the home to the office, in the diagram below
![Example for shortest path](https://lh3.googleusercontent.com/GaOV_vdJ-V1G3t27g7bHQzeSbBwdl38bjsTp5xdxgFvrfTU5PAB92FDY7y7zzuPA1rT86N9eaQV6)
Do you know how to represent the roads between different places? This is **Weighted Directed Graph** - a directed graph with edges having weights.
## Quiz Time:
Now can you find the shortest path from the source vertex to the target vertex in the image below?
![Quiz](https://lh3.googleusercontent.com/M4ye9O11B0X1C6qp-Ox-m87G0TJpIn_qOmzi8lEwnV_AYPYoDLK3lG528zElEgEVIMADutFRvbEl "Quiz")
Answer: The dark line shows the answer.
![Answer to the quiz problem](https://lh3.googleusercontent.com/nopAw8ZspVs0cI4YPuu4duBbdRRgoaju3mSSCNb_tVuc9jPSyHsflWQOKhpbLt164llxGQ0rZVEh "Answer to the quiz problem")
-- --
Single Source Shortest Path problem (SSSP)
-----------------------------
**Statement**: Given a graph find out the shortest path from a given point to all other points.
### How to solve this problem?
When we talk about the graph we have two standard techniques to traverse: DFS and BFS.
Can you solve this problem using these standard algorithms?
Certainly. We will see how to find the shortest path from the source to the destination using DFS.
Let's look at how can we solve it using DFS.
### Solution using DFS:
**Algorithm:** DFS approach is very simple.
- Start from the source vertex
- Explore all the vertices adjacent to the source vertex
- For each adjacent vertex, If it satisfies the relaxation condition then update the new distance and parent vertex
- Then recurse on it, by considering itself as the new source.
**Relaxation** means if you reach the vertex with less distance than encountered before, then update the data. **Parent vertex** means the vertex by which the particular vertex is reached.
This way DFS will explore all the possible paths from the source vertex to the destination vertex and will find out the shortest path.
Here in the code, we will represent the graph using adjacency list representation:
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
// Function to print the required path
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
void DFS(int source, int destination, vector<vector<pair<int,int> > > &graph, vector<int> &distances, vector<int> &parent)
{
// When we reach at the destination just return
if(source == destination)
return;
// Do DFS over all the vertices connected
// with the source vertex
for(auto vertex: graph[source])
{
// Relaxation of edge:
// If the distance is less than what we
// have encountered uptil now then update
// Distance and Parent vertex
if(distances[vertex.second] > distances[source] + vertex.first)
{
distances[vertex.second] = distances[source] + vertex.first;
parent[vertex.second] = source;
}
// Do DFS over all the vertices connected
// with the source vertex
DFS(vertex.second, destination, graph, distances, parent);
}
}
int main()
{
// Number of vertices in graph
int n = 6;
// Adjacency list representation of the
// Directed Graph
vector<vector<pair<int, int> > > graph;
graph.assign(n + 1, vector<pair<int, int> >());
// Now make the Directed Graph
// Note that edges are
// in the form (weight, vertex ID)
graph[1].push_back( make_pair(1, 2) );
graph[1].push_back( make_pair(6, 2) );
graph[2].push_back( make_pair(1, 4) );
graph[4].push_back( make_pair(1, 3) );
graph[3].push_back( make_pair(1, 5) );
graph[2].push_back( make_pair(7, 5) );
graph[4].push_back( make_pair(3, 6) );
// Array to store the distances
vector<int> distances(n+1, MAX_DIST);
// Array to store the parent vertices
vector<int> parent(n+1, -1);
int source = 1, destination = 5;
distances[source] = 0;
// Do DFS
DFS(source, destination, graph, distances, parent);
int shortest_distance = distances[destination];
// To print shortest_distance
cout << shortest_distance << endl;
// To print the path of the shortest_distance
printpath(parent, destination, source, destination);
return 0;
}
```
### Time Complexity of DFS
As DFS explores all possible paths from the given source vertex to the destination vertex, this may lead to exponential time complexity in a very dense graph.
So, our DFS approach is not efficient. What next?
We have a very interesting and elegant algorithm to solve the problem named " **Dijkstra's Algorithm**". This is the algorithm whose variants are used by most of the path finding applications.
## Dijkstra's Algorithm:
Dijkstra's Algorithm solves SSSP.
Dijkstra's Algorithm is an iterative algorithm similar to **BFS**, but here we have a little twist.
**Algorithm:**
1. Mark the distance between the source and all other vertices to $\infty$ and assign all the parent vertices to some sentinel (not used value).
2. The set is a collection of vertices which is empty at the start of the algorithm.
3. Set the distance between source to source as 0 and add the source vertex to the set.
4. Find the minimum distance vertex from the set and erase it from the set. Mark it as processed. let this vertex be $A$.
5. Explore all the edges connected to $A$.
6. If the edge is connected to the unprocessed vertex and it satisfies the relaxation condition below, then relax that edge and insert the vertex into the set.
**Relaxation Condition:** Let the edge be $A-u$,
$$\text{Distance}[u] < \text{Distance}[A] + \text{EdgeWeight}(A,u)$$
4. If the set is empty, then stop the algorithm.
5. Otherwise, repeat from step 3.
In Dijkstra's Algorithm, we explore the vertices in the way BFS does, however here we have to find the minimum distance vertex from the set.
Which data structure can help us find the minimum distance vertex from the set?
**Priority Queue** is the data structure which achieves this goal in an efficient manner $\mathcal{O}(\log n)$.
**Visualization:**
- Here cloud represents the processed nodes - the nodes whose final shortest distance is found.
- Solid lines represent the edges that are discovered.
- Numbers in square brackets represent the current distance of that vertex from the source vertex.
1. Start from the source vertex 1. ![enter image description here](https://lh3.googleusercontent.com/FjtCUJsZLrgfR91fhg9SnWq6mZ6huoVoP32ps_Z6V1N1saPJNb_BBCIRB_IlilwuPIK87WPul3u7)
2. Vertex 1 discovers vertices 2 and 3. It consequently does the relaxation and adds vertices 2 and 3 in the set.
![enter image description here](https://lh3.googleusercontent.com/2jJ_CHiOTouAQxCrgpeJ6qK25_ZZuFgh-kI1B19Fsl0NoB9m3rtmtupPUJv4ttC2YSg3X-LD5nnq)
3. The minimum distance vertex in the set is vertex 2. So vertex 2 will be the next source vertex.
It will discover vertices 5 and 4.
Now vertex 2 is added to the cloud as it is processed.
![enter image description here](https://lh3.googleusercontent.com/Gjkc7w3DJKA0k9McJPpSk1HOG9nENp15oWoqcQoreTxS1VGCBdgUP2qdzDDE5KFSHc90p6rNDeI6)
4. The next minimum distance vertex is vertex 4.
It relaxes both 3 and 6. Then vertex 4 will be added to the cloud.
![enter image description here](https://lh3.googleusercontent.com/M_6-KvDeY2V3dkJGfwPKwSDqe6crTNIc9S3N-2e6Zel8pTGTnMQEKp4T4O6pSAj3Ss2aNLxYSMkK)
5. Next vertex will be 3. It relaxes 5.
![enter image description here](https://lh3.googleusercontent.com/rQFxj-uiNigNUNA19aiMtd9eqv1hbIzTtv80UFgB3ij7iJchfZnWiFLuxRDM1AQ4uQMHM8ZOWaP7)
6. Now there will not be any more relaxations.
All the vertices in the priority queue will dequeue without adding any new vertex to the set.
![enter image description here](https://lh3.googleusercontent.com/Hnf3MyVP3V6jAck4trl8mWltMwfMRiP7u0NLYgtX99aQ7T5r4MzNFR3_IyHDjQi4K95kI5FC0ZO-)
Now we have found the shortest distances to all the vertices from the given source vertex.
We can use parent array to store parent of the given vertex, and then we can find out the shortest path for any vertex.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
// Function to print the required path
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
// Dijkstra's Algorithm
int Dijkstra_Algo(vector<vector<pair<int, int> > >& Graph,
int src, int target, vector<int> & distances, vector<int> & parent)
{
// Minimum Priority Queue to keep track of the discovered
// vertex which has the minimum distance
priority_queue<pair<int, int>, vector<pair<int, int> >,
greater<pair<int, int> > > container;
// To check whether vertex is in the cloud
vector<bool> processed(Graph.size());
// Start with source vertex
// Push the source vertex in the
// Priority Queue
container.push(make_pair(0, src));
// Assign distance to source as 0
distances[src] = 0;
while (!container.empty()) {
// Pop the least distance vertex from the Priority Queue
pair<int, int> temp = container.top();
int current_src = temp.second;
// Pop the minimum distance vertex
container.pop();
processed[current_src] = true;
// current source vertex
for (auto vertex : Graph[current_src]) {
// Distance of the vertex from its
// temporary source vertex
int distance = distances[current_src] + vertex.first;
// Relaxation of edge
if (!processed[vertex.second] && distance < distances[vertex.second]) {
// Updating the distance
distances[vertex.second] = distance;
// Updating the parent vertex
parent[vertex.second] = current_src;
// Adding the relaxed edge in the prority queue
container.push(make_pair(distance, vertex.second));
}
}
}
// return the shortest distance
return distances[target];
}
int main()
{
// Number of vertices in graph
int n = 6;
// Adjacency list representation of the
// Directed Graph
vector<vector<pair<int, int> > > graph;
graph.assign(n + 1, vector<pair<int, int> >());
// Now make the Directed Graph
// Note that edges are
// in the form (weight, vertex ID)
graph[1].push_back( make_pair(1, 2) );
graph[1].push_back( make_pair(6, 2) );
graph[2].push_back( make_pair(1, 4) );
graph[4].push_back( make_pair(1, 3) );
graph[3].push_back( make_pair(1, 5) );
graph[2].push_back( make_pair(7, 5) );
graph[4].push_back( make_pair(3, 6) );
// Array to store the distances
vector<int> distances(n+1, MAX_DIST);
// Array to store the parent vertices
vector<int> parent(n+1, -1);
// For example destination is taken as 5
int source = 1, destination = 5;
distances[source] = 0;
// Dijkstra's algorithm
int shortest_distance = Dijkstra_Algo(graph, source, destination, distances, parent);
// To print shortest_distance
cout << shortest_distance << endl;
// To print the path of the shortest_distance
printpath(parent, destination, source, destination);
return 0;
}
```
## Time Complexity of Dijkstra's Algorithm:
1. $\mathcal{O}(|E|)$ time to make the Weighted Directed Graph.
2. $\mathcal{O}(|E| + |V|)$ can be taken by the while loop which is similar to BFS time complexity.
3. $\mathcal{O}(\log |V|)$ time to insert the relaxed vertex in the priority queue.
So the overall complexity of the Dijkstra's Algorithm will be
$$\mathcal{O}((|E| + |V|) \times \log(|V|))$$
Here $|E|$ represents the number of edges in the graph and $|V|$ represents the number of vertices in the graph.
## Limitation of Dijkstra's Algorithm:
When there are negative weight edges in the graph, then Dijkstra's algorithm does not work.
Example:
![Negative Edge Problem in Dijkstra](https://picasaweb.google.com/110514411166133524120/6771365618877108449#6771365621395377650 "negative_edges")
It will find the path from $a$ to $c$ incorrectly.
What if there is a negative cycle in the graph?
**Negative Cycle:** Cycle in which sum of the weights of the edges is negative.
When there is any negative cycle in the graph then Dijkstra's algorithm will run forever, because then we can reach the affected vertices in negative infinite cost.
Note that the graph, we have used, is connected. If you have a vertex which is not connected, then you cannot find the shortest distance to it!
## **Applications of Dijkstra's algorithm:**
1. One very famous variant is the "Widest Path Finding Algorithm", which is an algorithm to finding a path between two vertices of the graph **maximizing the weight of the minimum-weight edge in the path**.
In the practical application, this problem can be seen as a graph with routers as its vertices and edges representing bandwidth between two vertices. Now if we want to find the maximum bandwidth path between two places in the internet connection, then this algorithm is useful which is highly based on Dijkstra's Algorithm.
2. A widely used application of the shortest path algorithm is in network routing protocols "Routing protocol".
## Other Shortest Path Finding Algorithms:
1 Bellman-Ford Algorithm
2. Using Dynamic Programming
3. All Pair Shortest Path Algorithm - **Floyd-Warshall Algorithm**, which finds the shortest path between every pair of vertices.

View File

@@ -0,0 +1,291 @@
## All Pair Shortest Path Problem
Statement: Given a graph, Find out the shortest distance paths between every pair of vertices.
## Brute Force
We have seen Dijkstra's algorithm and Bellman-Ford algorithm, which solves the SSSP problem.
We can run these algorithms for every vertex one by one, which solves the given problem.
This will give the complexity of $\mathcal{O}( (|V| + |E|) \cdot |V| \cdot log|V|)$ in the case of Dijkstra's algorithm and $\mathcal{O} (|V|^2 \cdot |E|)$ in the case of Bellman-Ford Algorithm.
**Floyd-Warshall Algorithm** solves all pair shortest path problem in an efficient manner.
## Floyd-Warshall Algorithm
This algorithm is an example of Dynamic Programming. We split the process of finding the shortest paths in several phases.
-- --
### Quiz Time
You are given that the shortest path between two vertices $A$ and $B$ passes through $C$, i.e. $C$ is the intermediate vertex in the shortest path from $A$ to $B$. What can you conclude from it?
Answer: $SD(A, B) = SD(A,C) + SD(C,B)$
$SD$ stands for Shortest Distance.
-- --
The above answer can be proved using contradiction.
The Floyd-Warshall algorithm works based on the above answer.
Before the $i^{th}$ phase of the algorithm, it finds out the shortest distance path between every pair of vertices which uses intermediate vertices only from the set $\{1,2,..,i-1\}$. In every phase of the algorithm, we add one more vertex in the set.
In the $i^{th}$ phase, we update the $distance$ matrix considering the two cases below:
For an entry $distance[a][b]$,
1. If the shortest path between $a$ and $b$ which uses intermediate vertices from the set $\{1,2,...i-1\}$ is longer than the one that uses $\{1,2,...,i\}$, then update the shortest distance path as below:
$distance[a][b] = distance[a][i]+distance[i][b]$
Where $distance[a][b]$ is the shortest distance between $a$ and $b$ which uses intermediate vertices from the set $\{1,2,..,i-1\}$.
2. Otherwise, $distance[a][b]$ will remain unchanged.
### Algorithm
1. Initialize $N \times N$ distance matrix with infinity.
2. Assign every element representing distance between vertex to itself to zero - assign every diagonal element of distance matrix to zero.
3. For all pair of vertices, If there is an edge between $A$ and $B$, then update $distance[A][B]$ to $Edge Weight(A,B)$.
4. Start from the first vertex, say phase $i=1$.
5. For all pairs of the vertices, update the new shortest distance between them using the condition below,
$distance[a][b] > distance[a][i]+distance[i][b]$
6. If $i$ is equal to the number of vertices, then stop otherwise increment $i$ and repeat step $5$.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
void Floyd_Warshall(int no_vertices, vector<vector<int> > & distances)
{
// i shows the phase
for(int i = 1; i <= no_vertices; i++)
{
// Update distance matrix for all pairs
for(int a = 1; a <= no_vertices; a++)
{
for(int b = 1; b <= no_vertices; b++)
{
if(distances[a][i] < MAX_DIST && distances[i][b] < MAX_DIST)
distances[a][b] = min(distances[a][b], distances[a][i]+distances[i][b]);
}
}
}
}
int main()
{
int no_vertices=5;
// N*N array to store the distances between every pair
vector<vector<int> > distances(no_vertices+1, vector<int> (no_vertices+1, MAX_DIST));
// Adding the edges virtually by
// updating distance matrix
distances[1][2]=1;
distances[1][3]=-3;
distances[2][4]=2;
distances[2][5]=1;
distances[3][4]=-1;
distances[3][5]=2;
distances[4][5]=1;
for(int i = 1; i <= no_vertices; i++)
distances[i][i] = 0;
Floyd_Warshall(no_vertices, distances);
// Printing the distance matrix
for(int i=1;i<=no_vertices;i++)
{
for(int j=1;j<=no_vertices;j++)
{
if(distances[i][j]!=MAX_DIST)
cout << setw(5) << right << distances[i][j] << " ";
else
cout << setw(5) << right << "INF" << " ";
}
cout << endl;
}
return 0;
}
```
### Time Complexity
As there are three loops, each running number of vertices time, the complexity will be $\mathcal{O}(|V|^3)$.
In worst case, this is better than the brute force approach.
### Path Reconstruction
To reconstruct the path, we have to store the next vertex for each pair and whenever we update the shortest distance we have to update the next vertex as well.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
void Floyd_Warshall(int no_vertices, vector<vector<int> > & distances, vector<vector<int> > & next)
{
// i shows the phase
for(int i = 1; i <= no_vertices; i++)
{
// Update distance matrix for all pairs
for(int a = 1; a <= no_vertices; a++)
for(int b = 1; b <= no_vertices; b++)
if( distances[a][b] > distances[a][i]+distances[i][b] &&
distances[a][i] < MAX_DIST && distances[i][b] < MAX_DIST )
{
distances[a][b] = distances[a][i]+distances[i][b];
next[a][b] = next[a][i];
}
}
}
int main()
{
int no_vertices=5;
// N*N array to store the distances between every pair
vector<vector<int> > distances(no_vertices+1, vector<int> (no_vertices+1, MAX_DIST));
// N*N array to store the next vertex for every pair
vector<vector<int> > next(no_vertices+1, vector<int> (no_vertices+1, -1));
// Adding the edges virtually by
// updating distance matrix and
// next vertex matrix
distances[1][2]=1, next[1][2] = 2;
distances[1][3]=-3, next[1][3] = 3;
distances[2][4]=2, next[2][4] = 4;
distances[2][5]=1, next[2][5] = 5;
distances[3][4]=-1, next[3][4] = 4;
distances[3][5]=2, next[3][5] = 5;
distances[4][5]=1, next[4][5] = 5;
// Update all the diagonal elements
for(int i = 1; i <= no_vertices; i++)
{
distances[i][i] = 0;
next[i][i] = i;
}
Floyd_Warshall(no_vertices, distances, next);
// Example of path reconstruction
int source = 1, destination = 5;
while(source != destination)
{
cout << source << " ";
source = next[source][destination];
}
cout << destination << endl;
return 0;
}
```
### Negative Cycle Case
We can use the Floyd-Warshall algorithm to detect the negative cycle and to find the pair of vertices affected by it.
Initially, the distance between the vertex to itself was zero, but if it turns out to be negative at the end of the algorithm, then there is a negative cycle.
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
void Floyd_Warshall_with_NC(int no_vertices, vector<vector<int> > & distances)
{
// i shows the phase
for(int i = 1; i <= no_vertices; i++)
for(int a = 1; a <= no_vertices; a++)
for(int b = 1; b <= no_vertices; b++)
if( distances[a][b] > distances[a][i]+distances[i][b] &&
distances[a][i] < MAX_DIST && distances[i][b] < MAX_DIST )
distances[a][b] = distances[a][i]+distances[i][b];
bool is_negative_cycle = false;
// Check for negative cycle
for (int i = 1; i <= no_vertices; ++i)
for (int a = 1; a <= no_vertices; ++a)
for (int b = 1; b <= no_vertices; ++b)
{
// If there is a negative cycle, then update the distance to -Infinity
if (distances[a][i] < MAX_DIST && distances[i][i] < 0
&& distances[i][b] < MAX_DIST)
{
distances[a][b] = -MAX_DIST;
is_negative_cycle = true;
}
}
}
if(is_negative_cycle)
{
cout << "The following pairs are affected by the negative cycle:" << endl;
for (int a = 1; a <= no_vertices; ++a)
for (int b = 1; b <= no_vertices; ++b)
if (distances[a][b] = -MAX_DIST)
cout << a << "--" << b << endl;
}
}
int main()
{
int no_vertices=5;
// N*N array to store the distances between every pair
vector<vector<int> > distances(no_vertices+1, vector<int> (no_vertices+1, MAX_DIST));
// Adding the edges virtually by
// updating distance matrix
distances[1][2]=1;
distances[1][3]=-3;
distances[2][4]=2;
distances[4][1]=1;
distances[3][4]=-2;
for(int i = 1; i <= no_vertices; i++)
distances[i][i] = 0;
Floyd_Warshall_with_NC(no_vertices, distances);
return 0;
}
```
### Application of Floyd-Warshall Algorithm
1. "Widest Path Finding Algorithm", which is an algorithm to finding a path between two vertices of the graph **maximizing the weight of the minimum-weight edge in the path**, can be solved using this algorithm.
2. To do optimal routing.
3. In Gauss-Jordan Algorithm, to find reduced form of matrix and inversion matrix.
## Other Shortest Path Finding Algorithms:
1. Using Dynamic Programming
2. Dijkstra's Algorithm
3. Bellman-Ford Algorithm

View File

@@ -0,0 +1,361 @@
## Kruskal's Algorithm
Suppose, You are running a company with several offices in different cities. Now, you want to connect all the offices by phone lines. Different networking companies are asking for different amount of money to connect different pairs of offices.
![enter image description here](https://lh3.googleusercontent.com/644Hn9oGhJPFAkZbyOyXtKx89cAlnim_O2dqWgc5W_YebIHwAHlWrsMpZPzgzbSwENFbsjuVYR0h)
***Comp** is an abbreviation of Company.
Now, how will you figure out the way with minimum cost?
Well, this problem can be solved by using classical algorithms to find the minimum spanning tree for the given graph.
What is the "**Minimum Spanning Tree**" ? and even before that, what is a "**Spanning Tree**"?
### Spanning Tree (ST)
Spanning Tree for a given undirected graph is a subgraph, which is a tree that includes every vertex of a graph with minimum possible number of edges.
### Quiz Time
$Q.1$ What is the minimum possible number of edges that can connect all the vertices of the undirected graph having $|V|$ vertices?
Answer: $|V|-1$
$Q.2$ Find one ST for the following graph.![enter image description here](https://lh3.googleusercontent.com/V4UYMyPf_paL45vwYaaZZ1EYzp2WwwqKmzS9NyqZT-WxtTvBrLzP4e7uI0iaarQOt-UkVJ19CHl5)
**Answer:** Dark lines represents the spanning tree which is not unique.
![enter image description here](https://lh3.googleusercontent.com/9EVi1ivLU2S0G7P4h_bbanZbSqoTm4C2eK_18J2c1F7_PNz9JL_2nmVlyi9VHTyc84YrgjywpLjy)
### Minimum Spanning Tree (MST)
Minimum Spanning Tree is the Spanning Tree with minimum cost.
Here the cost has different meanings for different kinds of problem. For example, In the above stated problem, cost is the money asked by different companies.
### Quiz Time
Find the MST for the given graph.
![enter image description here](https://lh3.googleusercontent.com/M-17sSWWCuciZYbp7X3FNOuT8EYObz4Ao0m6_dvrCmUcR5nre5Kdzau5KCLlSA92OE0G3l6xYd6w)
**Answer:**
![enter image description here](https://lh3.googleusercontent.com/IjdTe4v1wNe1CPqweKQdktcnNI7ZT2XRaj01VmhC16orCqGPSJjPTEumQf78NNRanYhOaNDJR0I5)
**Note**: Here we are talking about an undirected graph because directed graph may or may not have ST. See the image below:
![enter image description here](https://lh3.googleusercontent.com/RZCQAvafhR94_siwLkjLdhtbgmXA4YU3iAmIOBJhh8G3wB615ZIYU6a9xpkKZrqPgcjn0V8jh8-1)
For a directed graph to have a ST, there must be a vertex (say "$root$") from which we can reach to every other vetex by directed paths.
How can you find a MST for an undirected graph?
## Brute Force
One basic idea is to find all the paths which are using exactly $|V| - 1$ edges and including all $|V|$ vertices - find all ST of a graph.
Take the minimum cost path(ST) which will be the MST for a given graph.
This process can lead to exponential time complexity, because in order to find all possible paths we have to spend exponential time in a very dense graph.
We have an elegant algorithm to solve MST finding problem very efficiently.
**Terminologies and notes:**
1. **Connected component** is a subgraph of a graph, which has a path between every pair of vertices in it. And each of the path uses no additional vertices outside of this subgraph.
2. At the start of the algorithm, each vertex is representing different component on their own.
3. Unifying two connected components results into one connected component, having all the vertices of both components.
## Kruskal's Algorithm
Kruskal's Algorithm is a **greedy algorithm**, which chooses the least weighted edge from the remaining edges at each step of the algorithm, to find the overall minimum weighted spanning tree.
### Algorithm
1. Sort the array of the edges, by the weights of the edges.
2. Now, loop over this sorted array.
If the edge is connecting two vertices which are not already in the same connected component, then add that edge in the list of MST edges. And also unify both of the components.
3. When all $|V|$ vertices are added in the tree, then stop the algorithm. Now this tree represents MST for a given graph.
At the end of the algorithm there will be only one connected component, which includes each vertex of the graph.
**Visualization**
![enter image description here](https://lh3.googleusercontent.com/jKx_yhQnsvEIpn4UIZxgaoM0gtjkFmMqSszqO0A0VDL-UminMn8TCP61_7sXaGafwcwtKgjcdji2)
![enter image description here](https://lh3.googleusercontent.com/xGlG4VqzhxnV-EochhwHC_wr5POGoK5z8BNNWhK_IA4vURTCIUAjaTctrBA4lsJFCakL5Kygcl6t)
![enter image description here](https://lh3.googleusercontent.com/Uds7vSzkS8yehyExYhPnYTFK8QnbfUMa0Kqh6TaCUTG2BoK7ONyulbuVFajKC7cq7AB2rWS_Jt3R)
![same Connected component thing](https://lh3.googleusercontent.com/c6O5HLwwkmAmaPAQBhZoKODGT2HrOwa1kfeXSPZUy7QxfmH8igeP-ZK9QI8v-DfhitJ6Rsoy9_2D)
![enter image description here](https://lh3.googleusercontent.com/LposVdQeQcUXvQak0_v5yl4ob7DwWEInwPgmdI2_QTgSgHwRx8ViOUhJlRBijSE87N0Fr3GnQS7P)
Are you wondering how we will do the step $2$ of the algorithm? Which is to find whether two vertices are already connected.
There are two different ways to do it.
1. Use an array to track the indices of the vertices. If both of the vertices are having same index, then they are already connected. Otherwise update the new indices accordingly and add that edge in ST.
2. Use **Disjoint Set Union** Data Structure, which does this operation very efficiently.
### Approach without DSU
This is very simple. We will check whether the edge is connecting the vertices are having different IDs. Here we are using IDs to represent the connected components.
If they have different IDs, which means that they belong to different connected components. So we will join both of them by just changing the IDs of the vertices of these components.
```c++
#include <bits/stdc++.h>
using namespace std;
// Object - Edge
struct edge{
// Edge from -> to
// having some weight
int from, to, weight;
edge(int a, int b, int w)
{
from = a;
to = b;
weight = w;
}
};
bool comparator(edge& a, edge& b)
{
return a.weight < b.weight;
}
signed main() {
int no_vertices = 4, no_edges = 5;
vector<edge> graph;
// Edges of graph
graph.push_back(edge(1,2,2));
graph.push_back(edge(1,4,5));
graph.push_back(edge(2,3,3));
graph.push_back(edge(1,3,4));
graph.push_back(edge(3,4,6));
// sorting the edges
sort(graph.begin(), graph.end(), comparator);
// To remember the edges in ST
vector<bool> is_in_ST(no_edges);
// Array to maintain the IDs of
// vertices
vector<int> ID(no_vertices+1);
int min_cost = 0;
for(int i = 0; i < no_vertices+1; i++)
ID[i] = i;
for(int i = 0; i < no_edges; i++)
{
int ida = ID[graph[i].from], idb = ID[graph[i].to];
// Connecting two set of vertices
if(ida != idb)
{
for(int j=1; j<=no_vertices; j++)
if(ID[j] == ida)
ID[j] = idb;
is_in_ST[i] = true;
min_cost += graph[i].weight;
}
}
// Cost to make MST
cout << min_cost << endl;
for(int i=0; i<no_edges; i++)
if(is_in_ST[i])
cout << graph[i].from << "---" << graph[i].to << endl;
return 0;
}
```
**Time Complexity**
1. To sort the edges: $\mathcal{O}(|E|log{|E|})$
2. To Update the IDs at each step takes $\mathcal{O}(|V|)$.
3. The outer most loop can run at most $|E|$ times.
Overall time complexity: $\mathcal{O}(|E|log|E| + |V|*|E|)$
### Approach with DSU
DSU burns down the time complexity of step-2 from $\mathcal{O}(V)$ to $\mathcal{O}(log|V|)$.
Here we will use DSU to find whether two vertices are already connected. If they are not connected, then we will use the union operation to connect them.
It is much simpler and efficient than the previous one.
```c++
#include <bits/stdc++.h>
using namespace std;
// Disjoint Set Union Structure
class Dsu
{
int size;
int numberofcomponents;
public:
int *IDs;
vector<int> sizes;
// Constructor
Dsu(int size)
{
numberofcomponents = size;
IDs = new int[size+1];
sizes.push_back(0);
for(int i=1; i <= size; i++)
{
IDs[i] = i;
sizes.push_back(1);
}
}
// Find the ID of the component
int find(int p)
{
int root = p;
while(p != IDs[p])
p = IDs[p];
//path compression
while(p != root)
{
int temp = root;
root = IDs[temp];
IDs[temp] = p;
}
return p;
}
// Join two components
void unify(int a, int b)
{
int ida = find(a);
int idb = find(b);
if(ida == idb) return;
//smaller will unify with bigger set
if(sizes[ida] > sizes[idb])
{
IDs[idb] = IDs[ida];
sizes[ida] += sizes[idb];
}
else
{
IDs[ida] = IDs[idb];
sizes[idb] += sizes[ida];
}
numberofcomponents--;
}
};
// Object - Edge
struct edge{
// Edge from -> to
// having some weight
int from, to, weight;
edge(int a, int b, int w)
{
from = a;
to = b;
weight = w;
}
};
bool comparator(edge& a, edge& b)
{
return a.weight < b.weight;
}
signed main() {
int no_vertices = 4, no_edges = 5;
vector<edge> graph;
// Edges of graph
graph.push_back(edge(1,2,2));
graph.push_back(edge(1,4,5));
graph.push_back(edge(2,3,3));
graph.push_back(edge(1,3,4));
graph.push_back(edge(3,4,6));
// sorting the edges
sort(graph.begin(), graph.end(), comparator);
// To remember the edges in ST
vector<bool> is_in_ST(no_edges);
// Array to maintain the IDs of
// vertices
int min_cost = 0;
Dsu unionfind(no_vertices);
for(int i = 0; i < no_edges; i++)
{
int ida = unionfind.find(graph[i].from);
int idb = unionfind.find(graph[i].to);
// Connecting two set of vertices
if(ida != idb)
{
unionfind.unify(ida,idb);
is_in_ST[i] = true;
min_cost += graph[i].weight;
}
}
// Cost to make MST
cout << min_cost << endl;
for(int i=0; i<no_edges; i++)
if(is_in_ST[i])
cout << graph[i].from << "---" << graph[i].to << endl;
return 0;
}
```
**Time Complexity**
1. To sort the edges: $\mathcal{O}(|E|log{|E|})$
2. To Update the IDs at each step takes $\mathcal{O}(log|V|)$.
3. The main loop can run at most $|E|$ times.
Overall time complexity: $\mathcal{O}(|E|log|E|+|E|log|V|)$
## Applications of MST
1. In Network Designs: Pipelining, Roads & Transportation, Telephone or Electric cable network, etc.
2. In Image recognization of handwritten expressions
## Other Algorithms to find MST
1. Prim's Algorithm
2. Boruvka's Algorithm

View File

@@ -0,0 +1,263 @@
## Prim's Algorithm
Like Kruskal's algorithm, Prim's algorithm is another algorithm to find the minimum spanning tree for the given undirected weighted graph.
Prim's algorithm is also a greedy algorithm, which is quite similar to Dijkstra's algorithm. If you are familar with Dijkstra's algorithm then you know that, we need to find a minimum distance vertex at each step, however in Prim's algorithm we need to find a minimum weight edge. Let's see the actual algorithm.
**Notes:**
- Here the term **tree** stands for an intermediate tree in the formation of the whole MST.
- **Explored edges** means the edges which are already found in the run of the algorithm.
- Here, we are assuming that the given undireted graph is connected.
## Algorithm
1. Select an arbitrary vertex say $V$ from the graph and start the algorithm from that vertex.
2. Add vertex $V$ in the tree.
3. Explore all the edges connected to vertex $V$.
4. Find the minimum weight edge from all the explored edges, which connects the tree to a vertex $U$ which is not yet added in the tree.
5. Set $V$ to $U$ and continue from step 2 until all $|V|$ vertices are in the tree.
**Visualization**
![enter image description here](https://lh3.googleusercontent.com/6K6LDcKyb37O1_bvWCTDqwArEKwUhWUIeMiHcYuwBBKiT96227ndkcrtVUXGIZh99b4797Ag8Tz8)
![enter image description here](https://lh3.googleusercontent.com/ZzoZu85ceeGJv_rITxLeNv3Aj_DM7ijXyY_aFPomRH5uuqCKAGGerfa4Y92AGQBn-ZowDlFe2jzx)
![enter image description here](https://lh3.googleusercontent.com/Xcynud0p9l6UuWZG4IYszTFg9oyxlL-SCikCcbVx5AQ1K07cFYo6OHVHUh6ugxLKU13W2RIBg-66)
![enter image description here](https://lh3.googleusercontent.com/h57vXQgMzcvQKdGDioi91GDnvl804VPWqs_QhY_tZVxhKvV9iLc183f5-5ynJO_vObCnd_8Yp-JB)
![enter image description here](https://lh3.googleusercontent.com/kSpm3PWqqX8d7ZDwEJhWYKooHiaYIP_BTYPDKOX1Em2n15XJKcv9dxfbHAiFFt3ER0DSEK4jklpc)
![enter image description here](https://lh3.googleusercontent.com/QtqN-LZRmrTNIrLEy0qUq0aT5tCa5jNv4RmOi5lnqQnQJ9f9Won5GUVk41gLz5b8ZeMtoQgV2Xh0)
![enter image description here](https://lh3.googleusercontent.com/k_OkqjacpH3p3NTfLVNs06D_RpPfui8PaPtGcHDTvUAoldoYntpDBaC4QovZu3S63U1pknEyFhoa)
![enter image description here](https://lh3.googleusercontent.com/0Npt2c9MTgDm4dkZTfO4lewZgZvfnbgixOK3laNX4JV0d-StPcpy-DXbmS3jSjDcmIWvFvvLJ0DQ)
![enter image description here](https://lh3.googleusercontent.com/5y89gCgIMo2d2BVlw73LsaW7tRVkJcdxKHGbCZEK2nNDBiefX1PFA0shkCsFSPvxVRmKh6Gc1oy5)
![enter image description here](https://lh3.googleusercontent.com/7-ap56y_klM0c-HUVSEhqIuV4g5lFYwQe1x9C1ucxHfsSB03BfyQbtb9AEjlT4QUQFoETKoEndY0)
![enter image description here](https://lh3.googleusercontent.com/zUfsEoA-rcEYQQCNGX6jdPOHYGFQ3za_qMHNJQmZEj6cMuINA66ApJ8UJ45giIRP2EdzB0F7NnF8)
![enter image description here](https://lh3.googleusercontent.com/-FhJklDlIaXRFsG3FCr0Pm8uvDQPsfH_IklonW6UpoIfpWwZ0LKFCW1J2JUZGagbNUFfd4DDDn9l)
![enter image description here](https://lh3.googleusercontent.com/b2AgnVPdxBK1xa5iMQPkbLcIbn_IBFPEHmCpjGAIIfQgjur5zYKHpLfGKwY_YC3R_NEcFfvAXV5F)
We will discuss two different approaches:
1. Adjacency List representation of graph
2. Adjacency Matrix representation of graph
Generally, we use Adjacency Matrix representation in case of Dense graph because it uses lesser space than the list representation, whereas we use Adjacency List representation in the case of sparse graph.
**Note:** **Minimum weight** represents the weight of a minimum weight edge observed so far in the algorithm, which is connected to a particular vertex.
## Sparse Graphs - Adjacency list representation
Here we are representing the graph using Adjacency list representation.
**Implementation Algorithm**
1. Initialize a boolean array which keeps track of, whether a vertex is added in the tree.
2. Initialize an array of integers say $\text{Minweight[]}$, where each entry of it shows the minimum weight, by $\infty$.
3. Start from any vertex say $A$. Mark the weight of the minimum weight edge to reach it as $0$.
4. Explore all of the edges connected with $A$ and update the minimum weights to reach the adjacent vertices, if the below condition is satisfied,
For an edge $A\to B$,
$\text{Minweight}[B] < \text{EdgeWeight}(A,B)$
Note that we are only looking for those adjacent vertices which are not already in the tree.
5. Find the minimum weight edge from all the explored edges and repeat from step $4$. Say that edge is $a - b$, then take $V$ as $b$ and repeat from step $4$.
6. If all the $|V|$ vertices are added in the tree, then stop the algorithm.
Can you tell, how we will do the step 5, which is to find the minimum weight edge from all the exlpored edges?
Here, we need to use some data structure which finds out the minimum weight edge from all the explored edges efficiently.
Do you know any of them?
We can use anyone of priority queue, fibonacci heap, binomial heap, balanced binary tree, etc.
**Note:** Below in the code, the parent array is used to retrieve the formed MST and **set** (STL container) is a kind of balanced binary search tree.
```c++
#include <bits/stdc++.h>
#define MAX_Weight 1000000000
using namespace std;
int main()
{
int no_vertices = 4;
vector<vector<pair<int,int>> > graph(no_vertices+1, vector<pair<int,int>>());
graph[1].push_back({2,6}); // A-B
graph[2].push_back({1,6});
graph[1].push_back({4,5}); // A-D
graph[4].push_back({1,5});
graph[2].push_back({3,3}); // B-C
graph[3].push_back({2,3});
graph[2].push_back({4,4}); // B-D
graph[4].push_back({2,4});
graph[3].push_back({4,2}); // C-D
graph[4].push_back({3,2});
// To track if the vertex is added in MST
vector<bool> inMST(no_vertices+1);
vector<int> minWeight(no_vertices+1, MAX_Weight),
parent(no_vertices+1);
// Minimum finding(logN) DS
set<pair<int,int>> Explored_edges;
Explored_edges.insert({0,1});
for(int i=2;i<no_vertices+1;i++)
Explored_edges.insert({MAX_Weight,i});
int VertinMST = 0, MSTcost = 0;
minWeight[1] = 0;
parent[1] = -1;
while(VertinMST < no_vertices)
{
// Vertex connected by Minimum weight edge
int vertex = Explored_edges.begin()->second;
MSTcost += minWeight[vertex];
inMST[vertex] = true;
VertinMST++;
Explored_edges.erase(Explored_edges.begin());
// Exploring the adjacent edges
for(auto i:graph[vertex])
{
// If we reach by lesser weighted edge then update
if(!inMST[i.first] && minWeight[i.first] > i.second)
{
// Previous larger weighted edge
Explored_edges.erase({minWeight[i.first],i.first});
minWeight[i.first] = i.second;
parent[i.first] = vertex;
// New smaller weighted edge
Explored_edges.insert({minWeight[i.first],i.first});
}
}
}
cout << MSTcost << endl;
cout << "Edges in MST:" << endl;
for(int i = 1; i <= no_vertices; i++)
{
if(parent[i] != -1)
cout << parent[i] << " " << i << endl;
}
return 0;
}
```
**Time Complexity**
1. We need $\mathcal{O}(log|V|)$ time to find the minimum weight edge.
2. We are doing the above step $\mathcal{O}(|E|+|V|)$ times, which is similar to BFS.
Overall time complexity: $\mathcal{O}((|E|+|V|)log|V|)$
## Dense Graphs - Adjacency matrix representation
Here, to loop over the adjacent vertices, we have to loop over all |V| entries of the adjacency matrix.
So, basically to update minimum weights we have to spend $O(|V|)$ time. And also to find the minimum weight edge we have to spend $O(|V|)$ time.
The implementation is much simpler than the previous one.
```c++
#include <bits/stdc++.h>
#define MAX_Weight 1000000000
using namespace std;
int main()
{
int no_vertices = 4;
int graph[no_vertices+1][no_vertices+1];
graph[1][2]=6; // A-B
graph[2][1]=6;
graph[1][4]=5; // A-D
graph[4][1]=5;
graph[2][3]=3; // B-C
graph[3][2]=3;
graph[2][4]=4; // B-D
graph[4][2]=4;
graph[3][4]=2; // C-D
graph[4][3]=2;
// To track if the vertex is added in MST
vector<bool> inMST(no_vertices + 1);
int VertinMST = 0, MSTcost = 0;
vector<int> minWeight(no_vertices + 1, MAX_Weight),
parent(no_vertices + 1);
minWeight[1] = 0;
parent[1] = -1;
for(int i=1; i<=no_vertices; i++)
{
int minvertex = 0, weight = MAX_Weight;
// Find the minimum weighted edge
for(int j=1; j<=no_vertices; j++)
if(!inMST[j] && (minvertex==0 || minWeight[j] < minWeight[minvertex]))
{
minvertex = j;
weight = minWeight[j];
}
inMST[minvertex] = true;
MSTcost += weight;
// Update the min weights
for(int j=1;j<=no_vertices;j++)
if(!inMST[j] && graph[minvertex][j] < minWeight[j])
{
minWeight[j] = graph[minvertex][j];
parent[j] = minvertex;
}
}
cout << MSTcost << endl;
cout << "Edges in MST:" << endl;
for(int i = 1; i <= no_vertices; i++)
{
if(parent[i] != -1)
cout << parent[i] << " " << i << endl;
}
return 0;
}
```
### Time Complexity
1. It takes $\mathcal{O}(|V|)$ time to find the minimum weight edge and also to update the minimum weights.
2. The outer loop runs $|V|$ times.
Overall time complexity: $\mathcal{O}(|V|^2)$
### Other algorithms to find MST
1. Kruskal's Algorithm
2. Boruvka's Algorithm

640
Akash Articles/md/Regex.md Normal file
View File

@@ -0,0 +1,640 @@
## Regular Expression (RegEx)
While filling online forms, haven't you come across errors like "Please enter valid email address" or "Please enter valid phone number".
Annoying as they may be, there's a lot of black magic that the computer does before it determines that, the details you've entered are incorrect.
Can you think out, what is that black magic? If you are familiar with algorithms, then you will say we can write an algorithm for the same.
Yes, we can write an algorithm to verify different things. But we have a standard tool, which is particularly designed for the similar kind of purposes.
It is **Regular Expression**. We call it **RegEx** for short. RegEx makes our work a lot easier. Let's see some basic examples where RegEx becomes handy.
Suppose, you are in search of an averge price of a particular product on amazon. The following regular expression will find you any price(ex. `$12`, `$75.50`) on the webpage: `\$([0-9]+)\.([0-9]+)`.
Quite interesting!
Let's look at another example. You have a long list of documents with different kinds of extensions. You are particularly looking for data files having **.dat** extension.
`^.*\.dat$` is a regular expression which represents a set of string ending with **.dat**. Regular expression is a standardized way to encode such patterns.
Well. What does the name **Regular Expression(RegEx)** represent? Regular Expression represents the sequence of characters that defines a regular search pattern.
RegEx is a standardized tool to do the following works:
1. Find and verify patterns in a string.
2. Extract particular data present in the text.
3. Replace, split and rearrange particular parts of a string.
We are going to look at all the three things above.
Let's begin the journey with RegEx!
**Note:**
1. In all the images below, the first section is a RegEx and below is a text, in which the matches are shown-the shaded regions show the match. All the images are taken using regexr.com. You can use it to do experiments on regex.
2. In all the images, Small dot between words in the text shows a space.
3. **Alpha-numeric character** belongs to anyone of the $0-9,A-Z,a-z$ ranges.
4. String is a sequence of characters and substring is a contiguous part of a string.
## Simple Alpha-numeric character matching
Simple matching of a specific word can be done as the following:
![enter image description here](https://lh3.googleusercontent.com/YGfz9u58rRKD0ABrSKDv7ZJOEMaIMGdFWgJWGGNzCFNakCtfAZVk1UEm7mBS4lIX1LFXoV420cmY=s1600)
As you can see it matches "Reg" in the text. Similarly, what will be the match for "Ex" in the same text above?
![enter image description here](https://lh3.googleusercontent.com/LkJXO79wn08dvgX5Q2JXHtyN7MW38AeNdV7fjG6lk7MNsiamx9iOekEGQg-WS9OLQMWxBuspjSkh=s1600)
Do you notice anything? It is a **case sensitive**.
**Note:** Most of the programming languages have libraries for RegEx. They have almost similar kind of syntax. Here, we will see how to implement it in **Javascript**.
Below is a basic code in Javascript for regex. The patterns are written in `/_____/g`. Where `g` is a modifier, which is used to find all matches rather than stopping at the first match.
**Note:** The function **exec** returns null, if there is no match and match data otherwise.
```js
// Main text (string) in which we are finding
// Patterns
var str = "RegEx stands for Regular Expression!";
// Pattern string
var pattern = /Reg/g;
// This will print all the data of matches
// across the whole string
while(result = pattern.exec(str))
{
console.log(result); // printing
}
// This will be the output
/*
[
'Reg',
index: 0,
input: 'RegEx stands for Regular Expression!',
groups: undefined
]
[
'Reg',
index: 17,
input: 'RegEx stands for Regular Expression!',
groups: undefined
]
*/
```
**Note:** **Groups** in the above output is a RegEx concept. We will look at it, keep reading.
Now, you can change the expression and text in the code above, to observe other patterns.
## Character classes:
![IMG](https://lh3.googleusercontent.com/dlLzL3teyEoax1JsdF7JeGP6DZOJll-UgnZqFkJtBeAVJhM1xMnXBHVeJYf_cUFLmj-f1qPO8asf=s1600)
What if you want to match both "soon" and "moon" or basically words ending with "oon"?
![enter image description here](https://lh3.googleusercontent.com/bsRHqYuPZIQ7Yra4-zyF1BX2pIYDukCEtTfCK3rjaCTRmTAuo_fuHTVK5sJjbTdbXjGTVq1z5eYc=s1600)
What did you observe? You can see that, adding `[sm]` matches both $soon$ and $moon$. Here `[sm]` is called character class, which is basically a list of characters we want to match.
More formally, `[abc]` is basically 'either a or b or c'.
Predict the output of the following:
1. **RegEx:** ``[ABC][12]``
**Text:** A1 grade is the best, but I scored A2.
Answer:
![enter image description here](https://lh3.googleusercontent.com/2JgAwwprVtZ8AfQMLoB9PEi7NoyJXxrEqv_46tLvdtCBKo4HBcPlbi3atyKsWJTwJLBTneyA_C0j=s1600)
2. **RegEx:** ```[0123456789][12345]:[abcdef][67890]:[0123456789][67890]:[1234589][abcdef]```
**Text:** Let's match 14:f6:89:3c mac address type of pattern. Other patterns are 51:a6:90:c5, 44:t6:u9:3d, 72:c8:39:8e.
Answer:
![enter image description here](https://lh3.googleusercontent.com/d2ynsBn5p8gIzvQeKewe8VrPiEu0EyOoNiEBkj_Co8fq_12FKhWK81V1Rcc2YCs3or9d4sCbuGtA=s1600)
Now, if we put `^`, then it will show a match for characters other than the ones in the bracket.
![soon moon noon woon](https://lh3.googleusercontent.com/rj-zgBEZ7Fdv6rckQgHC90L_j7y1X7jj8veTZQoOKGQ2RSiEHPxPeSZUZoJE9yLW-o2dvXj6OI1j=s1600)
Predict the output for the following:
**RegEx:** ```[^13579]A[^abc]z3[590*-]```
**Text:** 1Abz33 will match or 2Atz30 and 8Adz3*.
Answer:
![enter image description here](https://lh3.googleusercontent.com/BXaE8cxW7PcMJcfoUTlY-xBm9qNuhB5isy-PDLS5hIqQGIdRWiUf4viVxHF5yn5DJ0wHtoqHYKmP=s1600)
Writing every character (like `[0123456789]` or `[abcd]`) is somewhat slow and also erroneous, what is the short-cut?
## Ranges
Ranges makes our work easier. Consecutive characters can simply be replaced by putting a dash between the smallest and largest character.
For example, `abcdef` --> `a-f`, `456` --> `4-6`, `abc3456` --> `a-c3-6`, `c367980` --> `c36-90`.
![regex](https://lh3.googleusercontent.com/PWRFyDwe-89sdNSbmGc528PZXWhoX_-GNq0gQ8X9fOA-NX1Q4hzQNq1-Ty1LYjjsL8L4nVbSgvaq=s1600)
Predict the output of the following regex:
1. **RegEx:** ```[a-d][^l-o][12][^5-7][l-p]```
**Text:** co13i, ae14p, eo30p, ce33l, dd14l.
Answer:
![enter image description here](https://lh3.googleusercontent.com/pgDHTvxQZ35ybyF7ozdeSGEBchiK8huiMQ3PfQWIgPSrzWoca8BpEoQ1yht8qyA4VVOdP6dNa-sl=s1600)
**Note:** If you write the range in reverse order (ex. 9-0), then it is an error.
2. **RegEx:** ``[a-zB-D934][A-Zab0-9]``
**Text:** t9, da, A9, zZ, 99, 3D, aCvcC9.
Answer:
![enter image description here](https://lh3.googleusercontent.com/ftl29tcq2QeaOCMpRs6kwEwtNYaxvTzkkLB-2SGi2WjkSWvUfAMmTTrE7NXtyORo8gaptghcnJQJ=s1600)
## Predefined Character Classes
1. **`\w` & `\W`**: `\w` is just a short form of a character class `[A-Za-Z0-9_]`. `\w` is called word character class.
![enter image description here](https://lh3.googleusercontent.com/UzEtYLNxnrtpDgOIW1N9SeyJ5Nyeh51hHIb516CwPOJutVSkWQZpcDfo09lSXmGzDMxDgtoikJAU=s1600)
`\W` is equivalent to ``[^\w]``. `\W` matches everything other than word characters.
![enter image description here](https://lh3.googleusercontent.com/cKEXAPheBxESGkBoe8zOONJP3REvaTUhYs4FPkPizMU4t-v2_enG-9Jk8tgF-HX6Wxrn0jQATBes=s1600)
2. **`\d` & `\D`**: `\d` matches any digit character. It is equivalent to character class `[0-9]`.
![enter image description here](https://lh3.googleusercontent.com/Q1WTXPIBFR0fCJ7QT5jdU_XummS39Jqzi96l1g_ijg-LA4hoSLf05pscFT32lW-39yEPC5uDP-V_=s1600)
`\D` is equivalent to ``[^\d]``. `\D` matches everything other than digits.
![enter image description here](https://lh3.googleusercontent.com/JWIIzBQOIqi7lPIkrveW6h_gL1C5sWd_0cNGCswkBxRGoNKDB9ZKN4Zwd21BdEmfuluuzu-THYpc=s1600)
3. **`\s` & `\S`**: `\s` matches whitespace characters. Tab(`\t`), newline(`\n`) & space(` `) are whitespace characters. These characters are called non-printable characters.
![enter image description here](https://lh3.googleusercontent.com/LbokzFHfw58rfmDUlcVoktdYHZtbWi76ddM-6-qyTiNVnk4s0Ea9KfC1KHRJkjTvDYRnbKXprkPr=s1600)
Similarly, `\S` is equivalent to ``[^\s]``. `\S` matches everything other than whitespace characters.
![enter image description here](https://lh3.googleusercontent.com/Vp2QdnqK-WOhuZaCZW82IBVNCPmVC--O2te2XzXKqCKwZJe4FKoJVHlzevhBgNfUSzF-34FcZFof=s1600)
4. **dot(`.`)**: Dot matches any character except `\n`(line-break or new-line character) and `\r`(carriage-return character). Dot(`.`) is known as a **wildcard**.
![enter image description here](https://lh3.googleusercontent.com/jwBp2XH1lL9ZRu_wASyTYsD03p81_3DIRjfHWtH5cA3jpSDuGfmE3P5A0RIhSfbrusmoV8w1D9k1=s1600)
**Note:** `\r` is known as a windows style new-line character.
Predict the output of the following regex:
1. **RegEx:** ``[01][01][0-1]\W\s\d``
**Text:** Binary to decimal data: 001- 1, 010- 2, 011- 3, a01- 4, 100- 4.
Answer:
![enter image description here](https://lh3.googleusercontent.com/YzmmRMcSqjhJPHthh_MwLnGldVl4nYR86Bb83viXeT2SM0koPmFjFKOathYXxxLyLKSz96Gkigcl=s1600)
### Problems
1. Write a regex to match 28th February of any year. Date is in dd-mm-yyyy format.
Answer: `28-02-\d\d\d\d`
2. Write a regex to match dates that are not in March. Consider that, the dates are valid and no proper format is given, i.e. it can be in dd.mm.yyyy, dd\mm\yyyy, dd/mm/yyyy format.
Answer: `\d\d\W[10][^3]\W\d\d\d\d`
Note that, the above regex will also match dd-mm.yyyy or dd/mm\yyyy kind of wrong format, this problem can be solved by using backreferencing.
## Alternation (OR operator)
**Character class** can be used to match a single character out of several possible characters. Alternation is more generic than character class. It can also be used to match an expression out of several possible expressions.
![enter image description here](https://lh3.googleusercontent.com/JxvB4lPgFBxk0FA-d3XapsLPad_JnhegB7NJ7BpRqmZkNuLwivUTAl08ek14EwpIM42LoIeB2O_c=s1600)
In the above example, ``cat|dog|lion`` basically means 'either cat or dog or lion'. Here, we have used specific expression(cat, dog & lion), but we can use any regular expression. For example,
![enter image description here](https://lh3.googleusercontent.com/syg6gq9LBaQfA43tyVVlIqHd3oD0jbhBnxrLkho21EsD4DoUy2gGfJZoLgWaz0PA5BxWHghdybRD=s1600)
### Problem
- Find a regex to match boot or bot.
Answer: There more than one possible answers: `boot|bot`, `b(o|oo)t`. Last expression is using a group.
### Problem with OR operator:
Suppose, you want to match two words **Set** and **SetValue**. What will be the regular expression?
From whatever we have learned so far, you will say, ``Set|SetValue`` will be the answer. But it is not correct.
![enter image description here](https://lh3.googleusercontent.com/fcX6JOLw9u7YK0Sbeec_0tol5IbcxFHR4zlb3TSgCoMb3tlkZlGWCNd-9KNc832YMcuG7TcceJyz=s1600)
If you try `SetValue|Set`, then it is working.
![enter image description here](https://lh3.googleusercontent.com/H8y8thO6EVan-NfxRKoGSEZG-ObgbX4EdfvHMJznr37m1Q-DekLpHMce3OqdAlje9jNQFy2ZK4u2=s1600)
Can you observe anything from it?
**OR operator** tries to match a substring starting from the first word(or expression)-in the regex. If it is a match, then it will not try to match the next word(or expression) at the same place in text.
Find out an regex which matches each and every word in the following set: `{bat, cat, hat, mat, nat, oat, pat, Pat, ot}`. The regex should be as small as possible.
**Hint:** Use character-class, ranges and or-operator together.
Answer: `[b-chm-pP]at|ot`
## Quantifiers (Repetition)
To match 3 digit patterns, we can use ``[0-9][0-9][0-9]``. What if we have n digit patterns? We have to write `[0-9]` n times, but that is a waste of time. Here is when quantifiers come for help.
1. **Limiting repetitions(``{min, max}``):** To match n digit patterns, we can simply write ``[0-9]{n}``. Instead of n, by providing minimum and maximum values as ``[0-9]{min, max}``, we can match a pattern repeating min to max times.
Let's see an example to match all numbers between 1 to 999.
![enter image description here](https://lh3.googleusercontent.com/i-Xd_gn0AYks2HX3HL-8kbVQWHaUzuuO5VO2ZoV5sqxIfFRyniKMEWNvM758zIfFb1ArY3q08dp5=s1600)
**Note:** If you don't write the upper bound(``{min,}``), then it basically means, there is no limit for maximum repetitions.
2. **``+`` quantifier:** It is equivalent to ``{1,}``-at least one occurrence.
![enter image description here](https://lh3.googleusercontent.com/_f5hQYEghXft3ZttKB7r177rDXXT4m04TQlkjnsg-2E5fkUAOZNUxeLYvWIt6T7B2XLTeVUkoXu1=s1600)
3. **``*``quantifier:** It is equivalent to ``{0,}``-zero or more occurrences. ![enter image description here](https://lh3.googleusercontent.com/vGqELFywEUZ5jilWFotcN_l4IC0MCpg45TMAB2k3x80nAm6gn_2R9NB3h8KMiXNB6aG3HlZ2C0Hs=s1600)
4. **``?`` quantifier:** It is equivalent to ``{0,1}``, either zero or one occurrence. ``?`` is very useful for optional occurrences in patterns.
Let's see an example to match negative and positive numbers.
![enter image description here](https://lh3.googleusercontent.com/YBbsvb14Aoje2CB32deP6kszaZ0OcUWThaK71y5RZ7q6eqQ8H4EkL8XzZOB9IoSKB_Tav37lE__W=s1600)
### Problems
1. Find out a regex to match positive integers or floating point numbers with exactly two characters after the decimal point.
Answer: `\d+(\.\d\d)?`
2. Predict the output of the following regex:
RegEx: `[abc]{2,}`
Text:
<code>aaa
abc
abbccc
avbcc
</code>
Answer:
![enter image description here](https://lh3.googleusercontent.com/Uo7nokSqOEM8zLuOuA2a56q74vvBUmvmNQiUjdPSfn2H9yzTVRAW2DZlfCEsOaZJxhWg7tIdBLcv=s1600)
**Nature of Quantifiers:**
HTML tag is represented as <tag_name>some text</tag_name>. For example, `<title>Regular expression</title>`
So, can you figure out an expression that will match both <tag_name> & </tag_name>?
Most of the people will say, it is `<.*>`. But it gives different result.
![enter image description here](https://lh3.googleusercontent.com/dYECtAiwn0dWJwY0K8gzb6U_vrzoihid1bgJxEvHA3G64Wm49dM5BVl5V41AHb3D1MxQ_t1MNXhh=s1600)
So, rather than matching up till first `>`, it matches the whole tag. So, quantifiers are greedy by default. It is called **Greediness!**
Now, if we use `?`, then following happens.
![enter image description here](https://lh3.googleusercontent.com/7Em674dudMFsnG-T5PiM8wpXBH8FOp3AW3o992orHzA49lO9muYkmAi4-NDKP4FV-Ay926fKdtEt=s1600)
### Lazy matching:
As we have seen, the default nature of quantifier is greedy, so it will match as many characters as possible.
![enter image description here](https://lh3.googleusercontent.com/ye_RJ7Gc34O3BL6zMFEFodtbHsrK9hnwfQ3sxYWkrDHHdUtR6JvOcPT77kKMaNtn3IJonpmWNz1E=s1600)
To make it lazy, we use `?` quantifier, which turns the regex engine to match as less characters as possible which satisfies the regex.
![enter image description here](https://lh3.googleusercontent.com/tHhbib6sbPqxvj9Or7SB2Gk6ODGjzwuAS0NOSi0FUb60SRaoc8RuxyGcGJqqWbInHmP6Z0eiQRua=s1600)
**Note:** Now, you may be thinking, what if we want to match characters like `*, ?, +, {, }` in the text. We will look at it shortly. Keep reading!
Predict the output of the following regex:
1. Predict the output of the following regex:
RegEx: `(var|let)\s[a-zA-Z0-9_]\w* =\s"?\w+"?;`
Text:
<code>var carname = "volvo";
console.log(carname);
let age = 8;
var date = "23-03-2020";</code>
Answer:
![enter image description here](https://lh3.googleusercontent.com/svrWV_MmE9x6WNVelmFRArJ-cvsY6VgUIUhT74KsvG0wdyifVmgOoMitGZHJ-ZjM1zgCJH6ID-S3=s1600)
## Boundary Matchers
Now, we will learn how to match patterns at specific positions, like before, after or between some characters. For this purpose we use special characters like `^`,`$`,`\b & \B`,`\A`,`\z & \Z`, which are known as anchors.
**Notes:**
- Line is a string which ends at a line-break or a new-line character `\n`.
- There is a slight change in javascript code, we were using up till now. Instead of `/____/g`, we will now use `/____/gm`. Modifier 'm' is used to perform multiline search. Notice it in next images!
- Word character can be represented by, `[A-Za-z0-9_]`.
- **Anchor `^`**: It is used to match patterns at the very start of a line.
For example,
![enter image description here](https://lh3.googleusercontent.com/AsEJmx-zVHGJAIOK7wzOpJLXC_EFtszcSvDE3ByluMDZkAh01Z6Z48n0LqZXLnjq0e7CiKqyXoWG=s1600)
It will show a match, only if the pattern is occuring at the start of the line.
- **Anchor `$`**: Similarly, ``$`` is used to match patterns at the very end of a line.
![enter image description here](https://lh3.googleusercontent.com/iwj0YbwW5I5ocLglFFypayBFmNpHClc3Bew-DXer_XYhvhJ2QMd0z4ZFEpgc_sZqMFA91HgbkxYz=s1600)
It will show a match, only if the pattern is occuring at the end of a line.
Example, both `^` and `$`,
![enter image description here](https://lh3.googleusercontent.com/udplrGsLdQkeEkwCZl06ZmqM7M-kSP18Lyk7w0qFqrB46EAZJhJalhQ3WbPPnzcNt4jeHJxlgAXi=s1600)
- **Anchors `\b` & `\B`**: `\b` is called **word boundary character**.
Below is a list of positions, which qualifies as a **boundary** for `\b`:
If Regex-pattern is ending(or starting) with,
- A word character, then boundary is itself(word character). Let's call it a word boundary.
- A non-word character, then boundary is the next word-character. Let's call it a non-word boundary.
So, in short `\b` is only looking for word-character at boundaries, so it is called **word boundary character**.
Let's first observe some examples to understand it's working:
![enter image description here](https://lh3.googleusercontent.com/RJqHdBX--517Xuq4MWDVoGsBoOFGIbWhqs8YOwWFU-LKjwSQLfVuQPHTgWjxU3rR54D6bg2TNxJu=s1600)
What did you observe? Our regex-pattern is starting and ending with a word character. So, the match occurs only if there is a substring starting and ending at word characters, which are required in our regex `[a-z]` and `\d` respectively.
Now, let's look at one more example.
![enter image description here](https://lh3.googleusercontent.com/NYHZmEuDDxajsib_cG248-x-L1YgxAR-Tn3KqYgnOJSM1uk1EQgbjTHTL_E9C-tNVjQ1qIxZyU_e=s1600)
Here `\+` will show a match for `+`.
What did you observe?
**First observation:** Our pattern is starting with a non-word character and ending with a word character. So, the match occurs only if there is a substring having a non-word boundary at starting and word boundary at the ending.
**Second observation:** Non-word character after a word-boundary does not affect the result.
`\b` need not be used in pair. You can use a single `\b`.
![enter image description here](https://lh3.googleusercontent.com/ZQWwiNznl_P_foHiLgoxmwN7b0xRN54qMHUk9B4wSuiDDKSWdkOY9rjSwjTGvbEWKqptvLG2i_Rk=s1600)
`\B` is just a complement of `\b`. `\B` matches at all the positions that is not a word boundary. Observe two examples below:
![enter image description here](https://lh3.googleusercontent.com/gDCZqZbdNHcP9XetuE8hqx_7BLE2rh27Z1Bnz-SPUKJmj-_qEdFrXfP4xCzpPCxA7RYN1ZP6L0Zk=s1600)
![enter image description here](https://lh3.googleusercontent.com/ucV3anQWFfYhO7qaXcU7-25bqR3W7KO87aSCVcd_GDMFtfEvWmqK-JvC0jJ62KQevzfhPdn4DMTC=s1600)
**Note:** `\A` and `\z & \Z` are another anchors, which are used to match at the very start of input text and at very end of input text respectively. But it is not supported in Javascript.
Predict the output of the following regex:
1. **RegEx:** ```^[\w$#%@!&^*]{6,18}$```
**Text:**
<code>This is matching passwords of length between 6 to 18:
Abfah45$
gadfaJ%33
Abjapda454&1 spc
bjaphgu12$
Note that no whitespace characters are allowed.</code>
Answer:
![enter image description here](https://lh3.googleusercontent.com/bsQ8pb1a2tBRUHfE8ul2wiJ_7rRQjilptThKAy2WY2P0ndudVI8UEsiyn8eXLGbRvY6hj7Fdg6GM=s1600)
2. RegEx: `\b\w+:\B`
Text: <code>1232: , +1232:, abc:, abc:a, abc89, (+abc::)</code>
Answer: ![enter image description here](https://lh3.googleusercontent.com/TgLgYDWKd9wbBmzzq1_m-CRh4t5ndAoLnB-6nH0kWTpJSkZk9ePhH_5j2uFByAzxLUSlGeeLf5HR=s1600)
## Groups & Capturing
Grouping is the most useful feature of regex. Grouping can be done by placing regular expression inside round brackets.
It unifies the regular expressions inside it as a single unit. Let's look at its usages one by one:
1. It makes the regular expression more readable and sometimes it is an inevitable thing.
![enter image description here](https://lh3.googleusercontent.com/JVFtW5n8xGothhQa-MCzSV-oIEFM7zpjOJWDAOto_JrjQalqcEt29LtCT4m62FZHuVLgntRJVJYN=s1600)
Suppose, we want to match both the sentences in the above text, then grouping is the inevitable thing.
![enter image description here](https://lh3.googleusercontent.com/_LLeROC6R9eAel9CF-FpwRvm4T2styQe60Qv4Zokhvky_6pbGrLJSWYd5TLaz5NwB6zaOwd3fKmX=s1600)
2. To apply quantifiers to one or more expressions.
![enter image description here](https://lh3.googleusercontent.com/cSn7JesNbcMaaXb_tFi1ymMlKtZxe7G09jROJtWuu7kPvUmAGOU_CDiVp9k0NQ8FuCistLgW4vUg=s1600)
Similarly, you can use other quantifiers.
3. To extract and replace substrings using groups. So, we call groups **Capturing groups**, becuase we are capturing data(substrings) using groups.
In this part, we will see how to extract and replace data using groups in Javascript.
**Data Extraction:**
Observe the code below.
```js
var str = "2020-01-20";
// Pattern string
var pattern = /(\d{4})-(\d{2})-(\d{2})/g;
// ^ ^ ^
//group-no: 1 2 3
var result = pattern.exec(str);
// printing
console.log(result);
/* Output will be:
[
'2020-01-20', //-------pattern
'2020', //-----First group
'01', //-------Second group
'20', //-------Third group
index: 0,
input: '2020-01-20',
groups: undefined
]
*/
// Data extraction
console.log(result[1]); // First group
console.log(result[2]); // Second group
console.log(result[3]); // Third group
```
In the output array, the first data is a match string followed by the matched groups in the order.
**Data Replacement:**
`Replace` is another function, which can be used to replace and rearrange the data using regex. Observe the code below.
```js
var str = "2020-01-20";
// Pattern string
var pattern = /(\d{4})-(\d{2})-(\d{2})/g;
// ^ ^ ^
//group-no: 1 2 3
// Data replacement using $group_no
var ans=str.replace(pattern, '$3-$2-$1');
console.log(ans);
// Output will be: 20-01-2020
```
As you can see, we have used `$group_no` to indicate the capturing group.
Predict the output of the following regex:
1. RegEx: `([abc]){2,}(one|two)`
Text:
<code>aone
cqtwo
abone
actwo
abcbtwoone
abbcccone
</code>
Answer: ![enter image description here](https://lh3.googleusercontent.com/oPAS6ExWeAQoTEc7GRnrd4iuCmeSZt1-6Jv58t1_clZgHkpT9U1qPJFEXE780Bdz9oTeypFEgpRy=s1600)
2. RegEx: `([\dab]+(r|c)){2}`
Text:
<code>1r2c
ar4ccc
12abr12abc
acac, accaca, acaaca
aaar1234234c, aaa1234234c
194brar, 134bcbb-c </code>
Answer: ![enter image description here](https://lh3.googleusercontent.com/7t717TZSxuWDD92l58z8v0zInRNqiGvdP1q_AGPrN419PfxrMsHj27SaliwSC6EK2KZHWWvsr___=s1600)
## Characters with special meaning
We have seen that, we are using `*`, `+`, `.`, `$`, etc for different purposes. Now, if we want to match them themselves, we have to escape them using escape character(backslash-\\) .
Below is the table for these kind of characters and their escaped version, along with their usages.
| Character | Usage | Escaped version |
|:---------:|:---------------------------:|:---------------:|
| \ | escape character | \\\ |
| . | predefined character class | \\. |
| \| | OR operator | \\\ |
| * | as quantifier | \\* |
| + | as quantifier | \\+ |
| ? | as quantifier | \\? |
| ^ | boundary matcher | \\^ |
| $ | boundary matcher | \\$ |
| { | in quantifier notation | \\{ |
| } | in quantifier notation | \\} |
| [ | in character class notation | \\[ |
| ] | in character class notation | \\] |
| ( | in group notation | \\( |
| ) | in group notation | \\) |
| -|range operator | NA
Sometimes, it is also preferred to use escaped forward slash(`/`).
## Backreferencing
Backreferencing is used to match same text again. Backreferences match the same text as previously matched by a capturing group. Let's look at an example:
![enter image description here](https://lh3.googleusercontent.com/VrwREOtqL_b2IPbzM2qJQVAiP9Q8XWoAny41UodrLlEzWBxUbOJZ3WTvR7T0b-9zHn7iOqN8op3l=s1600)
The first captured group is (`\w+`), now we can use this group again by using a backreference (`\1`) at the closing tag, which matches the same text as in captured group `\w+`.
You can backreference any captured group by using `\group_no`.
Let's have two more examples:
![enter image description here](https://lh3.googleusercontent.com/Wx30vdBz2zif4zqMt1P6rJIh9b3NBOWz0XMzGZR50gU5n8p4sxhtCRWYl1j5hWYfJpI6jC5VEDEX=s1600)
![enter image description here](https://lh3.googleusercontent.com/Ji2buSF895THR4SIILfEou2SJMmuFExrpJsG8xWoxPSl6O-hDv7wPOHP1b_145NYh-M0yQYe0BxE=s1600)
**Problems:**
1. Match any palindrome string of length 6, having only lowercase letters.
Answer: `([a-z])([a-z])([a-z])\3\2\1`
2. RegEx: `(\w+)oo\1le`
Text: `google, doodle jump, ggooggle, ssoosle`
Answer:
![enter image description here](https://lh3.googleusercontent.com/y_4mk8QPlH0dWWqzXVhm5_V9wZlieX36x_sPTfX_Tr86l5SFD0so0ejYXD2dy2BiadXWHGVzxfkW=s1600)
**Note:** For group numbers more than 9, there is a syntax difference.
## Named Groups
Regular expressions with lots of groups and backreferencing can be difficult to maintain, as adding or removing a capturing group in the middle of the regex turns to change the numbers of all the groups that follow the added or removed group.
In regex, we have facility of named groups, which solves the above issue. Let's look at it.
We can name a group by putting `?<name>` just after opening the paranthesis representing a group. For example, `(?<year>\d{4})` is a named group.
Below is a code, we have already looked in **capturing groups** part. You can see, the code is more readable now.
```js
var str = "2020-01-20";
// Pattern string
var pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
// ^ ^ ^
//group-no: 1 2 3
// Data replacement using $<group_name>
var ans=str.replace(pattern, '$<day>-$<month>-$<year>');
console.log(ans);
// Output will be: 20-01-2020
```
Backreference syntax for numbered groups works for named capture groups as well. `\k<name>` matches the string that was previously matched by the named capture group `name`, which is a standard way to backreference named group.
![enter image description here](https://lh3.googleusercontent.com/GbalU1cVIs8G5kP_pTXBj8eC1MF1M3vNVFaeAvnFcVyOUPg7mhBDoVu8nye9oC4inIg_4Vnzst1y=s1600)
## Practical Applications of RegEx
1. Syntax highlighting systems
2. Data scraping and wrangling
3. In find and replace facility of text editors
Now that, you have learned RegEx. Let's look at some classical examples of RegEx.
## Classical examples
1. **Number Ranges:**
Can you find a regex matching all integers from 0 to 255?
First, Let's look at how can we match all integers from 0 to 59:
![enter image description here](https://lh3.googleusercontent.com/my4yBp9jXoS3wHCjkL_OVD3EjmsUtj61hIWKCssvS5PH0et26vbhhSIoM5Jphx-FVwK6yvc0qT7u=s1600)
As you can see, we have used `?` quantifier to make the first digit(0-5) optional. Now, can you solve it for 0-255?
Hint : Use OR operator.
We can divide the range 0-255 into three ranges: 0-199, 200-249 & 250-255. Now, creating an expression, for each of them independently, is easy.
| Range| RegEx |
| :--: | :--: |
| 0-199 | `[01][0-9][0-9]` |
| 200-249| `2[0-4][0-9]`|
| 250-255| `25[0-5]`|
Now, by using OR operator, we can match the whole 0-255 range.
![enter image description here](https://lh3.googleusercontent.com/ao-aGXzSqOsICjuE8DlAmTtNCvMWVHKeStYMlccX3ounTHaTlI4qLNhAcm7jM_3P0FPAjXi0k2zB=s1600)
As you can see, the above regex is not going to match 0, but 000. So, how can you modify the regex which matches 0 as well, rather than matching 001 only?
![enter image description here](https://lh3.googleusercontent.com/qxfoJm05roxaS3FwOoTy0Nr88Ku2KvZPHRUl7_HB5ybcpUhLspgQAbitJjvV-J5Gi4OG2upbTl9K=s1600)
We have just used `?` quantifier.
2. **Validate an IP address:**
IP address consists of digits from 0-255 and 3 points(`.`). Valid IP address format is (0-255).(0-255).(0-255).(0-255).
For example, 10.10.11.4, 255.255.255.255, 234.9.64.43, 1.2.3.4 are Valid IP addresses.
Can you find a regex to match an IP-address?
We have already seen, how to match number ranges and to match a point, we use escaped-dot(`\.`). But in IP address, we don't allow leading zeroes in numbers like 001.
So, We have to divide the range in four sub-ranges: 0-99, 100-199, 200-249, 250-255. And finally we use OR-operator.
![enter image description here](https://lh3.googleusercontent.com/UXWulQq4P0STobfIjrW4Um5jaqwSqTPx3KSSYYvEg_HZJ22a5wNQYio3qQUtzgidIOxLO_M48M9p=s1600)
So, Regex to match IP Address is as below:
![enter image description here](https://lh3.googleusercontent.com/n2CaC-8Q8NH-H5RkDrCM4AQQkV2PamIAlA3dwljdRsW33WWoj18qJEIN5iyzjLzfifHj-dh-IW-u=s1600)
**Note:** The whole expression is contiguous, for the shake of easy understanding it is shown the way it is.
### Bonus Problem:
Predict the output of the following regex:
**RegEx:** ``\b(0|(1(01*0)*1))*\b``
**Text:** This RegEx denotes the set of binary numbers divisible by 3:
0,11,1010, 1100, 1111, 1001
Answer:
![enter image description here](https://lh3.googleusercontent.com/d4c8LeDk2JahKEQXXkMfPwDoDEmO6ijmqd4u3aHDUJ9_At1DZY95HSGZMPbJPn4Mptlfp3p1XZ4x=s1600)

View File

@@ -0,0 +1,324 @@
## Single source shortest path using Dynamic Programming for Directed Acyclic Graph(DAG)
We can solve the single source shortest path problem using dynamic programming. How?
First of all, we will see how to solve this problem using recursion with memoization(top-down approach) followed by bottom-up dynamic programming approach.
## Quiz Time
You are given that the shortest path between two vertices $A$ and $B$ passes through $C$. What can you conclude from it?
Answer: $SD(A, B) = SD(A,C) + SD(C,B)$
$SD$ stands for Shortest Distance.
-- --
The above answer can be proved using contradiction. Suppose that shortest path between $A$ and $B$ passes through $C$ and the path from $A$ to $C$ is not shortest, then we can replace it with some other vertex $D$ such that the shortest path becomes $A \to D \to B$ which is better than $A \to C \to B$, but it is a contradiction becuase we have asssumed that $A \to C \to B$ is the shortest path between $A$ and $B$.
We will use the above property together with memoization to solve the problem. How?
Suppose that we are searching a shortest path between $u$(source) and $v$. Then we know that the shortest path between $u$ and $v$ must be passing through one of the vertices which are putting a directed edge on $v$ (incoming edges for $v$).
![enter image description here](https://lh3.googleusercontent.com/zcgHZ2l-KORFTbFr5yjnu1BxscxqrsWqBF1tfpTDnE4aLPvWdrkQSiQXtuizV8y3SQHRH35XOgvW)
So, first of all we will find a shortest path between source and $x$, $y$, $z$ one by one. Then we can see that the shortest path between $u$ to v will be either $\text{SP}(x) \to v$ or $\text{SP}(y) \to v$ or $\text{SP}(z) \to v$ depending on which one is minimum from $\text{SD}(x)+w1$, $\text{SD}(y)+w2$, $\text{SD}(z)+w3$, respectively.
**Note:** $\text{SP}$ stands for shortest path upto the given vertex and $\text{SD}$ stands for shortest distance.
Done? Let's see proper algorithm.
## Recursive Memoization Approach
Now, if we want to find out the shortest path from source $u$ to vertex $v$, then start the recursion from the vertex $v$.
Move in the reverse direction of the directed edges connected with $v$ and recurse on each of the (reverse)adjacent vertex(i.e. $x,y,z$) until you reach at the source u.
Meanwhile update the shortest distances accordingly.
**Note:** **Memoization** is just the memorization of the obtained results, which can be used again and again to obtain new results.
One thing to notice is that, once the shortest distance for a vertex is found, we can do memoization and use it again to block unnecessary recursive calls.
### Algorithm
1. Assign all the distances to $\infty$.
2. Now, assign source to source distance to $0$ and start the algorithm.
3. Loop through all the vertices, if the distance to a vertex is not found yet then start the recursion over that vertex.
4. In the recursive function, suppose you are starting from a vertex $v$, then move backward over the incoming edges to the vertex $v$.
![enter image description here](https://lh3.googleusercontent.com/onE03fJDK7zzaXlGFcqQcx240bvuxKcIEoA3RbIJZ690tShQfJW0CGbxnCamgmUeVKHoIvNuouQi)
In the image above $x$,$y$,$z$ are these vertices, we can reached by moving backward over the incoming edges to the vertex $v$.
5. Now, we will do recursive call over all these vertices and find out the shortest distance to all of them first and update $distance[v]$ as below:
Say $u_1, u_2, \ldots , u_n$ are the vertices we reached by moving backward over the edges.
$distance[v] = min(distance[v], ShortestDistance(u_i) + EdgeWeight(u_i,v) )$ $\forall i<=n$.
**Note:** Stop the recursion at the source vertex, which is a base case.
![enter image description here](https://lh3.googleusercontent.com/3GbFgUwMFikEvKCqNL1n87hSozUolombL-yfTYogzAp0WSS4ng42m7v4YyfJbRf-POqtccxMVvPa)
![enter image description here](https://lh3.googleusercontent.com/AKO9pxqivto0TY4UrbxRTDurlwPGXpdY7jnRaUbFc5wVyqn3r3aR1m2I_s2Jk-HK4OM8MaVpet_p)
![enter image description here](https://lh3.googleusercontent.com/bEIMeeErMvscD0b88nqLmGkF9QKr33GtW4GAloUspaQcy35sM8-AI4YbLrQ3jy-G43IMe1jlO7it)
![enter image description here](https://lh3.googleusercontent.com/lHwWgRt36q9jtYek91GFBoG5JUcke9KbPFABAb5p-ttVDv6ZlaUVu7pQOAXorB5YVseccpm_X5qv)
![enter image description here](https://lh3.googleusercontent.com/rD1tS9C80Lef2s1VNikFWxtB0Z4zyluiCCPEbT22z4HEC70wlKK95ukeWtfVUgm_HfhuxQ9C2Q8Y)
![enter image description here](https://lh3.googleusercontent.com/Hm8iD_nyjbnVTz8ZrxHfmddk44AbOYlXgyeEAW6puNaNZdek0N_yecGJWN6n2KUC2kz0sltctko1)
![enter image description here](https://lh3.googleusercontent.com/knyYkHrc_zidlqaa-MuhlodxuDADgs02ID-mg3lczTuqVU8V0bb_SI2yEhCxT2IxWXL_AfmQBcoL)
![enter image description here](https://lh3.googleusercontent.com/E0lKJfEkQ2Tj_cnxQiYiznbUcYcS43DLYsNIME3Mg0So6S9wusy50tRXJrtnEvY_31f3W4jgrqEJ)
![enter image description here](https://lh3.googleusercontent.com/y-h0rDBFvMtb-ednsSpScqlYlzUDEhctpcTaDnLRJHgI2r4sfJI7fM8o7kUZg7EQsBrOfQRj3-aD)
![enter image description here](https://lh3.googleusercontent.com/7c7GLD_-XW3lmhAvVUQLRjt6KiCv43rhKDxMncYkp3eGfY5xFNp9JhzJT6-FxdX7S79XPr2ZWhDr)
![enter image description here](https://lh3.googleusercontent.com/CARyqEf3tKiaKGefYVjJ4LFbdfjM1QlufJszMU1OgE-MGfsRTdh-7VIQg8TDWvDZ5OwY9jZzdzVB)
![enter image description here](https://lh3.googleusercontent.com/B8HNXOYfkfwKTXxRDBJ9swnWR4uqcHsB1BJbHsBKzMOcEr4I50wwlzvnzJlxYrR_nEs45GZesGX0)
![enter image description here](https://lh3.googleusercontent.com/9AZFtU7GqxTguAxR1FQFm9pvV0q-biopPHWsYoYA03xci8e_9cZJmye1Pmr7Rpkr-6f0cqJLOnP5)
![enter image description here](https://lh3.googleusercontent.com/hk5oggibZqf43non8o6i3eQqSxv7scCg3YnFAnwKBV7iFoo_MKXWFVODEbVOn5JP5J9m28ZHpAcS)
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
struct edge {
int from, to, weight;
edge(int a,int b,int w)
{
from = a;
to = b;
weight = w;
}
};
int Shortest_Path(vector<vector<edge> > &graph, int source, int vertex,
vector<int> &distances, vector<int> &parent)
{
if(vertex == source)
return 0;
if(distances[vertex] != MAX_DIST)
return distances[vertex];
for(auto vedge: graph[vertex])
{
int new_distance = Shortest_Path(graph, source, vedge.from,
distances, parent) + vedge.weight;
if(new_distance < distances[vertex])
{
distances[vertex] = new_distance;
parent[vertex] = vedge.from;
}
}
return distances[vertex];
}
void printpath(vector<int>& parent, int vertex, int source, int destination)
{
if(vertex == source)
{
cout << source << "-->";
return;
}
printpath(parent, parent[vertex], source, destination);
cout << vertex << (vertex==destination ? "\n" : "-->");
}
int main()
{
int no_vertices = 6;
vector<vector<edge> > graph(no_vertices+1, vector<edge>());
// Making the graph using
// Reverse edges
graph[2].push_back(edge(1,2,1));
graph[2].push_back(edge(1,2,6));
graph[4].push_back(edge(2,4,1));
graph[3].push_back(edge(4,3,1));
graph[5].push_back(edge(3,5,1));
graph[5].push_back(edge(2,5,7));
graph[6].push_back(edge(4,6,3));
vector<int> distances(no_vertices + 1, MAX_DIST), parent(no_vertices +1, -1);
int source = 1, destination = 5;
distances[source] = 0;
for(int i = 1; i <= no_vertices; i++)
if(distances[i] == MAX_DIST)
Shortest_Path(graph, source, i, distances, parent);
for(int i = 1; i <= no_vertices; i++)
cout << distances[i] << " ";
return 0;
}
```
## Bottom-Up Dynamic Programming Approach
How can we do bottom-up dynamic programming to solve the SSSP problem?
Where to start the bottom-up dp? Can we start it at any random vertex? No, We cannot. Right? Now the thing is that, we have to order the vertices in some order, such that before reaching to a particular vertex(say $v$), we must have found the shortest distances to all the vertices, which are putting an incoming edge over $v$.
Are you familiar with this kind of ordering of vertices? It is **"Topological ordering"**. Topological ordering ensures that we will process vertices in the required manner.
Let's see the algorithm.
### Algorithm
1. First of all, find the topological sort of the vertices.
2. Assign all the distances to $\infty$.
3. Start the algorithm by assigning the distance from source to source as zero.
4. Loop over the vertices in the order generated by the topological sort and update the shortest distances to all the adjacent vertices of all of them, one by one in the order.
**Visualization**
![enter image description here](https://lh3.googleusercontent.com/8AWaULupuk47vWbCt03JqmMztnzlTQX6hI5trXsEto5WvTxet-g__JHbf-oapvnopySkMFXMlzzP)
![enter image description here](https://lh3.googleusercontent.com/R1Lx2VpAo1HZhJGXndacjD58_JKm3Sf2Mzp8XiWxwUVfSoxfHdsGY_za-MSbOX1edhbO8b7qUU45)
![enter image description here](https://lh3.googleusercontent.com/x2k7_th3-rd49fuYlQ21SYPlatfR2yJdtVAQjCS3Kci0BvPZ9QPGKHvSaVR90A2hHD-cZrdxw46c)
![enter image description here](https://lh3.googleusercontent.com/rkzvSCcNWPL_YvH2Me8EQbK2UYof9SKm-xAcmBHkFe1PKuWJ93w_nMQMKZ91nWggccz9oDLoDMGD)
![enter image description here](https://lh3.googleusercontent.com/BaIRyy92osZI9hhptyFCLpQ2BkjWtJ0GBy7c_Z0gjtXjlUv7fnumHiRmslwqODKBirFlml1xLwKo)
![enter image description here](https://lh3.googleusercontent.com/6cOdgwGMmBqfdyqS1F46n6OP3bs7db9jZhE91dOukxYqNmdM_-mCC53qZ_cH8Evs5viatICjiTtD)
![enter image description here](https://lh3.googleusercontent.com/374THC8sJO1ESUkpjh5KUzAD-e2CcmkrfCD9-lVdDO4xC8M1LOvOLpLr-oMQuHC1NEeFRB3fW7VB)
![enter image description here](https://lh3.googleusercontent.com/v5HpkcVrmwDetuMXaoOqAr09Z8iCtrG28xOfqEbb6EL40CjZVANN38HQAi21XLJ6yCk2nwkV-YtP)
![enter image description here](https://lh3.googleusercontent.com/WGQH8BF5a1X1FTIyV9Ik8TAhkGcUrsirM6MzZbrcRoBcH7rzPE0ictdnfxYTLnh_C8ZKHibwVfJ7)
![enter image description here](https://lh3.googleusercontent.com/oXacIqRQl0TZ8hKUtXiT8N1TqkpWff67K0Vw_huIQseGavjVM8cKUp4n1vFlmhP_QWolar-yCPmu)
![enter image description here](https://lh3.googleusercontent.com/PwWja4G0ZAncUnGJcCin2EyajAzJR4wNOC-5WhdLvN0ejwDB3Pq8XTencVr6HlU2hFfTQuZTPmCv)
![enter image description here](https://lh3.googleusercontent.com/Drs6P4aLXaimWKNMi_0-rsoy_sN7QGLV7q-XxfPeCtHXS1XmcVCCTYgbgj37VAFU0edvF9PLUGP0)
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
struct edge {
int from, to, weight;
edge(int a,int b,int w)
{
from = a;
to = b;
weight = w;
}
};
// Kahn's Algorithm for Topological Sort
list<int> Topological_Sort(vector<vector<edge> > &graph, int no_vertices)
{
list<int> topological_order;
vector<int> indegrees(no_vertices + 1);
for(int i = 1; i <= graph.size(); i++)
for(int j = 0; j < graph[j].size(); j++)
indegrees[graph[i][j].to]++;
queue<int> que;
for(int i = 1; i <= no_vertices; i++)
if(indegrees[i] == 0)
que.push(i);
while(!que.empty())
{
int V = que.front();
que.pop();
topological_order.push_back(V);
for(auto i: graph[V])
{
--indegrees[i.to];
if(indegrees[i.to] == 0)
que.push(i.to);
}
}
return topological_order;
}
void shortest_path_dp(vector<vector<edge> > &graph, vector<int> &distances, int source)
{
distances[source] = 0;
list<int> topological_order = Topological_Sort(graph, no_vertices);
for(auto i: topological_order)
for(auto edgev: graph[i])
if(distances[edgev.to] > distances[edgev.from] + edgev.weight)
distances[edgev.to] = distances[edgev.from] + edgev.weight;
}
int main()
{
int no_vertices = 6;
vector<vector<edge> > graph(no_vertices+1, vector<edge>());
graph[1].push_back(edge(1,2,1));
graph[1].push_back(edge(1,3,6));
graph[2].push_back(edge(2,4,1));
graph[4].push_back(edge(4,3,1));
graph[3].push_back(edge(3,5,1));
graph[2].push_back(edge(2,5,7));
graph[4].push_back(edge(4,6,3));
vector<int> distances(no_vertices + 1, MAX_DIST);
int source = 1;
shortest_path_dp(graph, distances, source);
for(int i = 1; i <= no_vertices; i++)
cout << distances[i] << " ";
return 0;
}
```
### Time Complexity
There are total $|V|$ subproblems and eachone takes $\Theta (Indegree(v) +1)$ time. So the total time complexity will be $\sum_{\forall v \in V} (Indegree(v) +1)$, which is equal to $\mathcal{O}(|V|+|E|)$.
Where $V$ is the set of vertices.
**Note**: Handshaking lemma for directed graph: $\sum_{\forall v \in V}Indegree(v) = E = \sum_{\forall v \in V} Outdegree(v)$

View File

@@ -0,0 +1,228 @@
Suppose, you want to enter into a well-known computer science university, but they always test students before giving admission. Now it is your turn, they give you a long list of pairs of courses. Each pair $(\text{course}A,\text{courseB})$ is representing that, completing the $\text{courseA}$ is a prerequisite for the $\text{courseB}$.
Now the challenge is that, they will give you two courses, $\text{courseX}$ and $\text{courseY}$. And you have to find out whether $\text{courseX}$ can be done before $\text{courseY}$.
For example, You are given a list: $(A,B), (B,C), (D,A), (C,E), (F,A)$, what will be the answer for two courses, D and C?
Yes, you can do $\text{course}D$ before $\text{course}C$. But...
The main twist is, the list is very large, so you can not remember everything. They are smart, so they have given you course names which are not computer science courses.
How will you tackle this test? If you are allowed to use some resources like computer, then?
Well, this problem can be solved by converting it to a graph where vertices are courses and directed edges representing prerequisites. Then find out the topological sort of the graph and you are done!!
But what is Topological Sort? Let's see.
## Topological Sort
Topological Sort is the sorting of the vertices in such a way that, for any two vertices $a$ and $b$, if there is a directed edge from $a \to b$ then $a$ must appear before $b$ in the topological ordering.
$a,b,c,d,e$ is the topologically ordering for the graph below.
![enter image description here](https://lh3.googleusercontent.com/zOBG-tWmznt9QeFbW-SopxIpvyDSH7RgpmYKL9fIgth_TkRQ3sfuxXsDw7iWgmmyGdqip4cay_WS)
## Quiz Time
$Q.1$ Is it always possible to find out the topological sort?
No. Whenever there is a cycle in a graph we can not find it. See the image below:
![enter image description here](https://lh3.googleusercontent.com/u-MLT9wrOoXBNQDqBqEi06jrnqudSGpBCi_oYG27-WI-zd8yEOs2PiBWyLVTimeYP42c5pk-N4AC)
We can not find proper dependency between any two vertices, becuase each of them are interdependent.
So the condition for the topological sort is, the graph must be a DAG - Directed Acyclic Graph.
$Q.2$ Does topological sort give a unique ordering of the vertices?
No, it is not unique in every case, because we can find many orderings which satisfies the condition of the topological sort.
![enter image description here](https://lh3.googleusercontent.com/A-uLwWY3HLE8aKWi0AAWxCDgkEidHTxi5u_CVYhASvoFZgaXr9nzXVNObaIP1BEyAUEE5yLyY6BM)
In the graph shown in the image above, $b,a,c,d,e$ and $a,b,c,d,e$ both are valid topological sorts.
But how to find a topological ordering for a given DAG?
**Notes:**
- Assuming that the graph is **DAG**, becuase you can not find topological sort in case of cycle (as shown above).
- **Indegree** for a vertex is count of the number of incoming directed edges.
We have two standard techniques for graph traversal: DFS and BFS. Can we use any of them to find out topological ordering?
Let's see, how modified BFS can be used to find it, which is known as **Kahn's Algorithm**.
## Kahn's Algorithm for Topological Sort
We will use the indegree of vertices to find the topological ordering. How? Let's observe:
In the image below, the number in the square bracket represents indegree for a vertex near to it.
![enter image description here](https://lh3.googleusercontent.com/YbZlobxN3dHtY7MkIXkZRI7u3Xsv6QTsngzoqn-hiNnMOuUs8tLweaxaR8snMkCTdgOjk7rh7W68)
As we have seen the topological ordering for the above graph is $a,b,c,d,e$. Now, what is your observation?
See that the vertices having indegree $0$ are appearing first in the ordering. But what next? What if we remove both the vertices and all the edges coming out from it?
![enter image description here](https://lh3.googleusercontent.com/czUu-KCQJ7Y78v0z4-rI4uKiXwsHPT2cmYEISfEQU3n89sEK8uZleMB5Rx6VYnxDjREvO2A94g9b)
See now, $c$ is the vertex having $0$ indegree, which is appearning next in the ordering.
And what if we remove $c$ and all the edges coming out from it. The next $0$ indegree vertices are $d$ and $e$, which are the next vertices in the ordering.
Done, right? Let's see the final algorithm.
## Algorithm
1. First of all, count the indegrees of all the vertices.
2. Add all the vertices whose indegrees are zero to the queue.
3. Start the loops like BFS.
4. Dequeue a vertex from the queue, say $V$ and append it to the sorting order.
5. Visit all the adjacent vertices of $V$ and decrease the indegrees of all of them by $1$.
6. Enqueue all the adjacent vertices whose indegrees become zero.
7. If the queue is empty, then stop the algorithm otherwise continue from the step $4$.
**Visualization**
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
void Topological_sort(int no_vertices, vector<vector<int> > &graph)
{
list<int> topological_order;
vector<int> indegrees(no_vertices + 1);
for(int i = 1; i < graph.size(); i++)
for(int j = 0; j < graph[i].size(); j++)
indegrees[graph[i][j]]++;
queue<int> que;
for(int i = 1; i <= no_vertices; i++)
if(indegrees[i] == 0)
que.push(i);
while(!que.empty())
{
int V = que.front();
que.pop();
topological_order.push_back(V);
for(auto i: graph[V])
{
--indegrees[i];
if(indegrees[i] == 0)
que.push(i);
}
}
for(auto i: topological_order)
cout << i << " ";
}
int main()
{
int no_vertices = 5;
vector<vector<int> > graph(no_vertices+1, vector<int>());
graph[1].push_back(3);
graph[2].push_back(3);
graph[3].push_back(4);
graph[3].push_back(5);
Topological_sort(no_vertices, graph);
return 0;
}
```
Now, we have seen the approach using modified BFS. Can we use DFS as well?
Yes, we can use DFS with some modification.
Here, we will start DFS from an arbitrary vertex-$U$ and first visit all the adjacent vertices by recursively calling DFS on all of them and then at the end, add U at the front of the topological order.
What does it do?
This thing makes sure that all the vertices which are dependent on U will appear after U, because we are adding U at the front of the ordering.
## Algorithm using DFS
1. Mark all the vertices as unvisited.
2. Choose any unvisited vertex and start a DFS start from it (say $V$).
3. Inside DFS, Mark $V$ as visited and loop over the adjacent vertices of $V$.
4. Recursivly call DFS on all the unvisited adjacent vertices.
5. As all the DFS calls completes, add $V$ at the front(head) of the topological order.
6. If there is an unvisited vertex then go to the step 2, else stop.
**Note:** You can take linked list to store the topological order, which is empty at the starting of the algorithm.
**Visualization:**
```c++
#include <bits/stdc++.h>
using namespace std;
#define MAX_DIST 100000000
void dfs(vector<vector<int> > &graph, int start, list<int> &linked_list, vector<bool>& visited)
{
visited[start] = true;
for(auto i: graph[start])
if(!visited[i])
dfs(graph, i, linked_list, visited);
linked_list.push_front(start);
}
void Topological_sort(int no_vertices, vector<vector<int> > &graph)
{
vector<bool> visited(no_vertices + 1);
list<int> linked_list;
for(int i = 1; i <= no_vertices; i++)
if(!visited[i])
dfs(graph, i, linked_list, visited);
for(auto vertex: linked_list)
cout << vertex << " ";
}
int main()
{
int no_vertices = 5;
vector<vector<int> > graph(no_vertices+1, vector<int>());
graph[1].push_back(3);
graph[2].push_back(3);
graph[3].push_back(4);
graph[3].push_back(5);
Topological_sort(no_vertices, graph);
return 0;
}
```
### Time Complexity:
It has the same time complexity as DFS and BFS: $\mathcal{O}(|V|+|E|)$
## Applications of Topological Ordering
1. It is used in the process to find out the shortest distance paths efficiently.
2. Scheduling of tasks according to their dependencies on each other, where dependencies are represented by the directed edges.