From 0200038448f81813fbcfebe01b668db5af4f7d62 Mon Sep 17 00:00:00 2001 From: Aakash Panchal <51417248+Aakash-Panchal27@users.noreply.github.com> Date: Sat, 7 Mar 2020 23:36:24 +0530 Subject: [PATCH] Create character_classes.html --- Akash Articles/RegEx/character_classes.html | 238 ++++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 Akash Articles/RegEx/character_classes.html diff --git a/Akash Articles/RegEx/character_classes.html b/Akash Articles/RegEx/character_classes.html new file mode 100644 index 0000000..7d0ddfa --- /dev/null +++ b/Akash Articles/RegEx/character_classes.html @@ -0,0 +1,238 @@ + + + + + + + + + + + +

Character classes:

+ +
+ +
+ +

What if you want to match both "soon" and "moon" or basically words ending with "oon"?

+ +
+ +
+ +

What did you observe? You can see that, adding [sm] matches both $soon$ and $moon$. Here [sm] is called character class, which is basically a list of characters we want to match.

+ +

More formally, [abc] is basically 'either a or b or c'.

+ +

Predict the output of the following:

+ +
    +
  1. RegEx: [ABC][12]
    + Text: A1 grade is the best, but I scored A2.

    + +

    Answer:

    + +
    + +
  2. + +
  3. RegEx: [0123456789][12345]:[abcdef][67890]:[0123456789][67890]:[1234589][abcdef]
    + Text: Let's match 14:f6:89:3c mac address type of pattern. Other patterns are 51:a6:90:c5, 44:t6:u9:3d, 72:c8:39:8e.

    + +

    Answer:

    + +
    + +
  4. +
+ +

Negation

+ +

Now, if we put ^, then it will show a match for characters other than the ones in the bracket.

+ +
+ +
+ +

Predict the output for the following:

+ +

RegEx: [^13579]A[^abc]z3[590*-] +
Text: 1Abz33 will match or 2Atz30 and 8Adz3*.

+ +

Answer:

+ +
+ +
+ +

Writing every character (like [0123456789] or [abcd]) is somewhat slow and also erroneous, what is the short-cut?

+ +

Ranges

+ +

Ranges make our work easier. Consecutive characters can be included in a character class using the dash operator, for example, numbers from 0 to 9 can be simply written as 0-9. Similarly, abcdef can be replaced by a-f.

+ +

Examples: 456 --> 4-6, abc3456 --> a-c3-6, c367980 --> c36-90.

+ +
+ +
+ +

Predict the output of the following regex:

+ +
    +
  1. RegEx: [a-d][^l-o][12][^5-7][l-p] +
    Text: co13i, ae14p, eo30p, ce33l, dd14l.

    + +

    Answer: +

    + + + +

  2. +
+ +

Note: If you write the range in reverse order (ex. 9-0), then it is an error.

+ +
    +
  1. RegEx: [a-zB-D934][A-Zab0-9]
    + Text: t9, da, A9, zZ, 99, 3D, aCvcC9. + Answer: +
    + +
  2. +
+ +

Predefined Character Classes

+ +
    +
  1. \w & \W: \w is just a short form of a character class [A-Za-Z0-9_]. \w is called word character class.

    + +
    + +
    + +

    \W is equivalent to [^\w]. \W matches everything other than word characters.

    + +
    + +
  2. + +
  3. \d & \D: \d matches any digit character. It is equivalent to character class [0-9].

    + +
    + +
    + +

    \D is equivalent to [^\d]. \D matches everything other than digits.

    + +
    + +
    + +
      +
    1. \s & \S: \s matches whitespace characters. Tab(\t), newline(\n) & space() are whitespace characters. These characters are called non-printable characters.
    + +
    + +
    + +

    Similarly, \S is equivalent to [^\s]. \S matches everything other than whitespace characters.

    + +
    + +
  4. + +
  5. dot(.): Dot matches any character except \n(line-break or new-line character) and \r(carriage-return character). Dot(.) is known as a wildcard.

    + +
    + +
  6. +
+ +

Note: \r is known as a windows style new-line character.

+ +

Predict the output of the following regex:

+ +
    +
  1. RegEx: [01][01][0-1]\W\s\d +
    Text: Binary to decimal data: 001- 1, 010- 2, 011- 3, a01- 4, 100- 4.

    + +

    Answer:

    + +
    + +
  2. +
+ +

Problems

+ +
    +
  1. Write a regex to match 28th February of any year. Date is in dd-mm-yyyy format.

    + +

    Answer: 28-02-\d\d\d\d

    + +
    + +
  2. + +
  3. Write a regex to match dates that are not in March. Consider that, the dates are valid and no proper format is given, i.e. it can be in dd.mm.yyyy, dd\mm\yyyy, dd/mm/yyyy format.

    + +

    Answer: \d\d\W[10][^3]\W\d\d\d\d

    + +
    + +
    + +

    Note that, the above regex will also match dd-mm.yyyy or dd/mm\yyyy kind of wrong format, this problem can be solved by using backreferencing, which is a regex concept.

  4. +
+ + + + + + +