As we have seen, the default nature of quantifiers is greedy, so it will match as many characters as possible.
To make it lazy, we use ?
quantifier, which turns the regex engine to match as less characters as possible which satisfies the expression.
Below is a table showing lazy version of all quantifiers:
Quantifier | Lazy version |
{n,m} | {n,m}? |
{n,} | {n,}? |
+ | +? |
* | *? |
? | ?? |
So, now we can match html tags as below:
Problem
Find an expression to match href="url"
in html file. Note that url can be anything, like https://xyz.com
, http://abc.io/app
, https://cde.org
.
Answer: href=".*?"
\w+? \w+?
in abc cde, 123 456
.
123 4
and abc d
We will see how to extract things(like, urls) from the text using regex, in the "group and capturing" concept.