Lazy matching:

As we have seen, the default nature of quantifiers is greedy, so it will match as many characters as possible.

To make it lazy, we use ? quantifier, which turns the regex engine to match as less characters as possible which satisfies the expression.

Below is a table showing lazy version of all quantifiers:

Quantifier Lazy version
{n,m} {n,m}?
{n,} {n,}?
+ +?
* *?
? ??

So, now we can match html tags as below:

Problem

  1. Find an expression to match href="url" in html file. Note that url can be anything, like https://xyz.com, http://abc.io/app, https://cde.org.

    Answer: href=".*?"

  2. What will be the match for expression \w+? \w+? in abc cde, 123 456.
    Answer: 123 4 and abc d
  3. We will see how to extract things(like, urls) from the text using regex, in the "group and capturing" concept.