mirror of
https://github.com/dholerobin/Lecture_Notes.git
synced 2025-03-15 21:59:56 +00:00
Update Regex_pending.md
This commit is contained in:
parent
2276748ea5
commit
8fafd4f9de
@ -1,3 +1,4 @@
|
||||
|
||||
## Regular Expression (RegEx)
|
||||
|
||||
While filling online forms, haven't you come across errors like "Please enter valid email address" or "Please enter valid phone number".
|
||||
@ -5,13 +6,12 @@ While filling online forms, haven't you come across errors like "Please enter va
|
||||
Annoying as they may be, there's a lot of black magic that the computer does before it determines that, the details you've entered are incorrect.
|
||||
|
||||
Can you think out, what is that black magic? If you are familar with algorithms, then you will say, we can write an algorithm for the same.
|
||||
Yes, we can write an algorithm to verify different things.
|
||||
|
||||
But we have a standard tool which is particularly designed for the similar kind of purposes.
|
||||
Yes, we can write an algorithm to verify different things. But we have a standard tool which is particularly designed for the similar kind of purposes.
|
||||
|
||||
It is **Regular Expression**. We call it **RegEx** for short. RegEx makes our work a lot easier. Let's see some basic examples where RegEx becomes handy.
|
||||
|
||||
Suppose, you are in search of an averge price of a particular product on amazon. The following regular expression will find you any price($12, $75.50) on the webpage: `\$([0-9]+)\.([0-9]+)`.
|
||||
Suppose, you are in search of an averge price of a particular product on amazon. The following regular expression will find you any price(\$12, \$75.50) on the webpage: `\$([0-9]+)\.([0-9]+)`.
|
||||
|
||||
Quite interesting!
|
||||
|
||||
@ -42,7 +42,7 @@ Simple matching of a specific word can be done as the following:
|
||||
|
||||

|
||||
|
||||
As you can see it matches "Reg" in the text. Similarly, what will be the match for pattern **"Ex"** in the same text above?
|
||||
As you can see it matches "Reg" in the text. Similarly, what will be the match for "Ex" in the same text above?
|
||||
|
||||

|
||||
|
||||
@ -97,9 +97,9 @@ What if you want to match both "soon" and "moon" or basically words ending with
|
||||
|
||||

|
||||
|
||||
What did you observe? You can see that adding $[sm]$ matches both $soon$ and $moon$. Here $[sm]$ is called character class, which is basically a list of characters we want to match.
|
||||
What did you observe? You can see that adding `[sm]` matches both $soon$ and $moon$. Here `[sm]` is called character class, which is basically a list of characters we want to match.
|
||||
|
||||
More formally, $[abc]$ is basically either $a$ or $b$ or $c$.
|
||||
More formally, `[abc]` is basically either `a` or `b` or `c`.
|
||||
|
||||
Predict the output of the following:
|
||||
|
||||
@ -118,7 +118,7 @@ Answer:
|
||||
|
||||

|
||||
|
||||
Now, if we put **^**, then it will show a match for characters other than the ones in the bracket.
|
||||
Now, if we put `^`, then it will show a match for characters other than the ones in the bracket.
|
||||
|
||||

|
||||
|
||||
@ -132,7 +132,7 @@ Answer:
|
||||

|
||||
|
||||
|
||||
Writing every characters(like $[0123456789]$ or [abcd]) is some what slow and also errorneous, what is the short-cut?
|
||||
Writing every characters(like `[0123456789]` or `[abcd]`) is some what slow and also errorneous, what is the short-cut?
|
||||
|
||||
## Ranges
|
||||
Ranges makes our work easier. Consecutive characters can simply be replaced by putting a dash between the first and last character.
|
||||
@ -160,25 +160,25 @@ Answer:
|
||||
|
||||
## Predefined Character Classes
|
||||
|
||||
1. **\w & \W**: '**\w**' is just a short form of a character class [A-Za-Z0-9_].
|
||||
1. **`\w` & `\W`**: `\w` is just a short form of a character class `[A-Za-Z0-9_]`.
|
||||
|
||||

|
||||
\W is equivalent to ``[^\w]``.
|
||||
`\W` is equivalent to ``[^\w]``.
|
||||

|
||||
|
||||
|
||||
2. **\d & \D**: '**\d**' matches any digit character. It is equivalent to character class [0-9].
|
||||
2. **`\d` & `\D`**: `\d` matches any digit character. It is equivalent to character class `[0-9]`.
|
||||
|
||||

|
||||
\D is equivalent to ``[^\d]``.
|
||||
`\D` is equivalent to ``[^\d]``.
|
||||

|
||||
3. **\s & \S**: '**\s**' matches white space characters. Tab('**\t**'), newline('**\n**') & space(' ') are whitespace characters.
|
||||
3. **`\s` & `\S`**: `\s` matches white space characters. Tab(`\t`), newline(`\n`) & space(` `) are whitespace characters.
|
||||

|
||||
|
||||
Similarly, \S is equivalent to ``[^\s]``.
|
||||
Similarly, `\S` is equivalent to ``[^\s]``.
|
||||

|
||||
|
||||
4. **dot(.)**: Dot matches any character except **\n**(line break or new line character) and **\r**(carriage-return character). It is known as **wildcard matching**.
|
||||
4. **dot(`.`)**: Dot matches any character except `\n`(line break or new line character) and `\r`(carriage-return character). It is known as **wildcard matching**.
|
||||
|
||||

|
||||
|
||||
@ -190,8 +190,9 @@ Predict the output of the following regex:
|
||||
**Text:** Binary to decimal data: 001- 1, 010- 2, 011- 3, a01- 4, 100- 4.
|
||||
Answer:
|
||||

|
||||
2. **RegEx code:** ``[01][01][0-1]\W\s\d``
|
||||
**Text:** Binary to decimal data:
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
## Alternation (OR operator)
|
||||
|
||||
@ -219,12 +220,16 @@ Can you observe anything from it?
|
||||
**OR operator** tries to match starting from the first word(in the expression), if it is a match, then it will not try to match next word(in the expression) at the same place in text.
|
||||
|
||||
Predict the output of the following regex:
|
||||
1.
|
||||
2.
|
||||
1. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
## Quantifiers (Repetition)
|
||||
|
||||
We have seen that to match 3 digit patterns we can use ``[0-9][0-9][0-9]``. What if we have n digit patterns? We have to write [0-9] n times, but that is really waste of time. Here is when quantifiers comes for help.
|
||||
We have seen that to match 3 digit patterns we can use ``[0-9][0-9][0-9]``. What if we have n digit patterns? We have to write `[0-9]` n times, but that is really waste of time. Here is when quantifiers comes for help.
|
||||
|
||||
1. **Limiting repetitions(``{min, max}``):** To match n digit pattern we can simply write ``[0-9]{n}``. Instead of ``{n}`` by providing minimum and maximum values as ``[0-9]{min, max}``, we can match a pattern repeating min to max times.
|
||||
|
||||
@ -246,7 +251,9 @@ Let's
|
||||

|
||||
|
||||
**Nature of Quantifiers:**
|
||||
HTML tag is represented as <tag_name>some text</tag_name>. So can you figure out a pattern that will match both <tag_name> and </tag_name>?
|
||||
HTML tag is represented as <tag_name>some text</tag_name>. For example, <title>Regular expression</title>
|
||||
|
||||
So can you figure out an expression that will match both <tag_name> & </tag_name>?
|
||||
|
||||
Most of the people will say, it is `<.*>`. But it gives different result.
|
||||

|
||||
@ -257,8 +264,12 @@ To make it lazy, we use `?` quantifier. That stops the regex engine going furthe
|
||||
|
||||
|
||||
Predict the output of the following regex:
|
||||
1.
|
||||
2.
|
||||
1. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
**Note:** Now you may be thinking, what if we want to match characters like ***, ?, +, {, },** etc in the text. We will look at it shortly. Keep reading!
|
||||
|
||||
@ -312,8 +323,12 @@ Example using both `^` and `$`:
|
||||
|
||||
Predict the output of the following regex:
|
||||
|
||||
1.
|
||||
2.
|
||||
1. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
## Groups & Capturing
|
||||
|
||||
@ -330,7 +345,7 @@ Suppose we want to match both the sentences, then grouping is the inevitable thi
|
||||

|
||||
Similarly, you can use other quantifiers.
|
||||
|
||||
3. To extract and replace substrings using groups. So we call groups **Capturing groups** becuase we are capturing data(substrings) using groups.
|
||||
3. To extract and replace substrings using groups. So we call groups **Capturing groups**, becuase we are capturing data(substrings) using groups.
|
||||
|
||||
In this part we will see how to extract and replace data using groups in Javascript.
|
||||
|
||||
@ -365,6 +380,7 @@ Similarly, you can use other quantifiers.
|
||||
console.log(result[3]); // Third group
|
||||
```
|
||||
`Replace` is another function which is used to replace and rearrange the data using groups.
|
||||
|
||||
```js
|
||||
var str = "2020-01-20";
|
||||
|
||||
@ -382,8 +398,12 @@ Similarly, you can use other quantifiers.
|
||||
```
|
||||
|
||||
Predict the output of the following regex:
|
||||
1.
|
||||
2.
|
||||
1. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
## Characters with special meaning
|
||||
|
||||
@ -395,7 +415,7 @@ Below is the table for these kind of characters and their escaped version, along
|
||||
|:---------:|:---------------------------:|:---------------:|
|
||||
| \ | escape character | \\\ |
|
||||
| . | predefined character class | \\. |
|
||||
| \| | OR operator | \\\| |
|
||||
| \| | OR operator | \\\ |
|
||||
| * | as quantifier | \\* |
|
||||
| + | as quantifier | \\+ |
|
||||
| ? | as quantifier | \\? |
|
||||
@ -408,7 +428,7 @@ Below is the table for these kind of characters and their escaped version, along
|
||||
| ( | in group notation | \\( |
|
||||
| ) | in group notation | \\) |
|
||||
|
||||
Sometimes, it is also preferred to use escaped forward slash(/).
|
||||
Sometimes, it is also preferred to use escaped forward slash(`/`).
|
||||
|
||||
|
||||
## Backreferencing
|
||||
@ -417,7 +437,7 @@ Backreferencing is used to match same text again. Backreferences match the same
|
||||
|
||||

|
||||
|
||||
The first captured group is (\w+), now we can use this group again by using a backreference (\1), at the closing tag, which matches the same text as in captured group \w+.
|
||||
The first captured group is (`\w+`), now we can use this group again by using a backreference (`\1`), at the closing tag, which matches the same text as in captured group `\w+`.
|
||||
|
||||
You can use backreferencing for any captured group as \group_no.
|
||||
|
||||
@ -426,8 +446,12 @@ Let's have one more example:
|
||||

|
||||
|
||||
Predict the output of the following regex:
|
||||
1.
|
||||
2.
|
||||
1. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
2. **RegEx code:**
|
||||
**Text:**
|
||||
|
||||
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user