diff --git a/Akash Articles/RegEx/character_classes.html b/Akash Articles/RegEx/character_classes.html new file mode 100644 index 0000000..7d0ddfa --- /dev/null +++ b/Akash Articles/RegEx/character_classes.html @@ -0,0 +1,238 @@ + +
+ + + + + + + + + +What if you want to match both "soon" and "moon" or basically words ending with "oon"?
+ +What did you observe? You can see that, adding [sm]
matches both $soon$ and $moon$. Here [sm]
is called character class, which is basically a list of characters we want to match.
More formally, [abc]
is basically 'either a or b or c'.
Predict the output of the following:
+ +RegEx: [ABC][12]
+ Text: A1 grade is the best, but I scored A2.
Answer:
+ +RegEx: [0123456789][12345]:[abcdef][67890]:[0123456789][67890]:[1234589][abcdef]
+ Text: Let's match 14:f6:89:3c mac address type of pattern. Other patterns are 51:a6:90:c5, 44:t6:u9:3d, 72:c8:39:8e.
Answer:
+ +Now, if we put ^
, then it will show a match for characters other than the ones in the bracket.
Predict the output for the following:
+ +RegEx: [^13579]A[^abc]z3[590*-]
+
Text: 1Abz33 will match or 2Atz30 and 8Adz3*.
Answer:
+ +Writing every character (like [0123456789]
or [abcd]
) is somewhat slow and also erroneous, what is the short-cut?
Ranges make our work easier. Consecutive characters can be included in a character class using the dash operator, for example, numbers from 0 to 9 can be simply written as 0-9. Similarly, abcdef
can be replaced by a-f
.
Examples: 456
--> 4-6
, abc3456
--> a-c3-6
, c367980
--> c36-90
.
Predict the output of the following regex:
+ +RegEx: [a-d][^l-o][12][^5-7][l-p]
+
Text: co13i, ae14p, eo30p, ce33l, dd14l.
Answer: +
Note: If you write the range in reverse order (ex. 9-0), then it is an error.
+ +[a-zB-D934][A-Zab0-9]
\w
& \W
: \w
is just a short form of a character class [A-Za-Z0-9_]
. \w
is called word character class.
\W
is equivalent to [^\w]
. \W
matches everything other than word characters.
\d
& \D
: \d
matches any digit character. It is equivalent to character class [0-9]
.
\D
is equivalent to [^\d]
. \D
matches everything other than digits.
\s
& \S
: \s
matches whitespace characters. Tab(\t
), newline(\n
) & space(
) are whitespace characters. These characters are called non-printable characters.Similarly, \S
is equivalent to [^\s]
. \S
matches everything other than whitespace characters.
dot(.
): Dot matches any character except \n
(line-break or new-line character) and \r
(carriage-return character). Dot(.
) is known as a wildcard.
Note: \r
is known as a windows style new-line character.
Predict the output of the following regex:
+ +RegEx: [01][01][0-1]\W\s\d
+
Text: Binary to decimal data: 001- 1, 010- 2, 011- 3, a01- 4, 100- 4.
Answer:
+ +Write a regex to match 28th February of any year. Date is in dd-mm-yyyy format.
+ +Answer: 28-02-\d\d\d\d
Write a regex to match dates that are not in March. Consider that, the dates are valid and no proper format is given, i.e. it can be in dd.mm.yyyy, dd\mm\yyyy, dd/mm/yyyy format.
+ +Answer: \d\d\W[10][^3]\W\d\d\d\d
Note that, the above regex will also match dd-mm.yyyy or dd/mm\yyyy kind of wrong format, this problem can be solved by using backreferencing, which is a regex concept.