Regular Expression (RegEx)

While filling online forms, haven't you come across errors like "Please enter valid email address" or "Please enter valid phone number".

Annoying as they may be, there's a lot of black magic that the computer does before it determines that, the details you've entered are incorrect.

Can you think out, what is that black magic? If you are familiar with algorithms, then you will say that we can write an algorithm for the same.

Yes, we can write an algorithm to verify different things, but we have a standard tool designed for similar kinds of purposes.

It is Regular Expression. We call it RegEx for short. RegEx makes our work a lot easier. Let's see some basic examples where RegEx becomes handy.

Suppose, you are in search of an averge price of a particular product on amazon. The following regular expression will find you any price(ex. $12, $75.50) on the webpage: \$([0-9]+)\.([0-9]+). In the Image below, yellowish part shows the matched prices.

enter image description here

Quite interesting!

Let's look at another example. You have a long list of documents with different kinds of extensions. You are particularly looking for data files having .dat extension.

^.*\.dat$ is a regular expression which represents a set of string ending with .dat. Regular expression is a standardized way to encode such patterns.

Below in the image, you can see that all three files having .dat extension are extracted from the list of five files.

enter image description here

Well. What does the name Regular Expression(RegEx) represent? Regular Expression represents the sequence of characters that defines a regular search pattern.

RegEx is a standardized tool to do the following works:

  1. Find and verify patterns in a string.
  2. Extract particular data present in the text.
  3. Replace, split and rearrange particular parts of a string.

We are going to look at all the three things above.

Let's begin the journey with RegEx!

Note:

  1. Alpha-numeric character belongs to anyone of the $0-9,A-Z,a-z$ ranges.
  2. String is a sequence of characters and substring is a contiguous part of a string.

In this article series, we are going to show all examples using live interactive playground, so that you can play with regex. You can activate playground by just clicking on it.

Simple Alpha-numeric character matching

Simple matching of a specific word can be done as following:

As you can see it matches "Reg" in the text. Similarly, what will be the match for "Ex" in the same text above?

Do you notice anything? It is a case sensitive.

Implementation

Most of the programming languages have libraries for RegEx. They have almost similar kind of syntax. Here, we will see how to implement it in Javascript.

Below is a basic code in Javascript, showing how to implement regex. The patterns are written in /_____/g. Where g is a modifier, which is used to find all matches rather than stopping at the first match.

The function exec returns null, if there is no match and match data otherwise.

var text_to_search_in = "RegEx stands for Regular Expression!";

var pattern = /Reg/g;

// This will print all the data of matches across the whole string
while(result = pattern.exec(text_to_search_in)) {
    console.log(result);
}
The output will be:
[
  'Reg',
  index: 0,
  input: 'RegEx stands for Regular Expression!',
  groups: undefined
]
[
  'Reg',
  index: 17,
  input: 'RegEx stands for Regular Expression!',
  groups: undefined
]

Note: Groups in the above output is a RegEx concept.