Five Regular Expressions That Make Your Day!

I’m here introducing 5 of the most common and useful regular expression patterns. This article indeed needs some basic knowledge ofregexMeta characters prior reading.

What are Regular Expressions?

Regular expressions (also known as ‘regex’ or ‘regexp’) are not a programming language of its own. But they can be considered as a sub language for any computer language. In a wider meaning Regular Expressions are a collection of characters (Meta characters) that form a search pattern mainly used in pattern matching in a file or text stream.

Why Regex?

Regular expressions would indeed speed up your searches by eliminating multiple searches forsame classof strings. In case of programmers it helps to reduce the length of their code (Think how the size ofjavascript affects website performance).

 Who and where to use it?

Virtually every programming languages support regular expressions. Application developers, Web developers, and System administrators all rely onregexesin their day to day life. You can useregexpsfor simple text searches, bulk file renaming, Apache redirection rules, large database queries etc.

 How it work?

It is impossible to put down a-z ofregexeshere butlet us take a tour on some of the important rather interesting patterns.

Note: Although every interpreter parsesregexesthey do it differently. So oneregexworking withperlshouldn’tnecessarybe with C. Inotherwayregexesare not fully portable. In this article I am trying to demonstrate them in a very UNIX-like way so that you can test them with standard unix text processing tools like awk, sed orgrep.

 1)    Match E-mail addresses

[a-z0-9\.-]+@[a-z0-9\.-]+\.[a-z]{2,3}

Example:

How it works:

[a-z0-9\.-] Matches any one of the characters from the ranges specified in square bracket. This matches small letter alphabets from a to z along with decimal numbers, Underscores, Hyphens and dots(.)Matches “test.mail” in  test.mail@domain.co.in
+ Matches at least one or more of the previous characters up to next implicit character inregex(@)
[a-z0-9\.-] Again matches any one of the characters from the ranges specified in square bracket.Matches “domain.co” in test@domain.co.in.
+ Can match anything comes after “domain”uptothe next explicit character.
\. This is to match the “.” that comes just before the TLD. The backslash is used here for escaping the special meaning of “.” inregexand instruct the parser to treat “.” as a normal character.
[a-z] This implies any one of the characters from small letter a to z
{2,3} This is a count of matches to be done with previous character or character set.Here [a-z]{2,3} will match the alphabets at least 2 times (“in”, “uk”..) or 3 times (“com”,”org”…) but  neither match single or more than 4.

 2)    Match IP addresses

 \b(([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.){3}([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\b

Example:

How it works:

\b For matching word boundaries.
( Starting the group1. (Since IP addresses are a group of 4 blocks of numbers we can use this group to match first three blocks)Parenthesis are used to createsub stringsinside a longregexpattern that can be referenced as a single item.
[01]?[0-9][0-9]? Matches numbers between 0 to 199. The “?” is used to match the preceding expression exactly zero or one time.
| The pipe symbol is used to do the OR operation betweenregexes. Match either one of the patterns on its sides.
2[0-4][0-9] Matches numbers between 200 and 249
| OR
25[0-5] Matches numbers between 250 and 255
\. Matches the dot between number blocks
) Encloses the all the previous patters in a subgroup.This sub group matches “192.” in 192.186.122.10
{3} Repeatedlysearches the previous pattern exactly 3 times.It matches “192.186.122.” in 192.186.122.10
([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5]) This is same as a simple number block search seen before without that trailing dot (\.)It matches “10” in 192.186.122.10

 3)    Match web URLs

 https?://[0-9a-z\.-]+\.[a-z]{2,3}([/0-9a-z.-]*)*

Example:

 How it works:

https? Web Urlsare startedwith the protocol specifies http or https. We can match either of these possibilities with the quantifier??isused to match exactly zero or one occurrence of the character just precedes it. This will match http or https but do not matchhttpss,httpsssorsomethingelse.
:// These are part ofgeneral uri schemeand arenormal character match.
[0-9a-z\.-]+\.[a-z]{2,3} Next is the turn to match the server or domain. See the domain matching done in email pattern.
([/0-9a-z.-]*) This is a groupformatching additional part of the URL, the directory path. We have to match 0 or more times the characters in square bracket up to next non-word character or a /.* does the trick, it matches any number ofoccurrenceof the previous element as well as its absence

Like “/blog” in http://test.example.com/blog/post1

* This additional * matches the previousgroup asmany times as possible.Say it will match the next directory “post1” too.

 4)    Match time

 ([01]?[0-9]|2[0-3]):[0-5]?[0-9](:[0-5]?[0-9])?\s([A-Za-z]{2})?

Example:

How it works:

([01]?[0-9]|2[0-3]) This is the first group and it will match either numbers from 0 to 19 or from 20 to 23. This block matches Hours.

? – Exactly 0 or 1 occurrence of previous character.

| – OR operator:This is the common field separator in time strings and thisis a direct charactermatch.[0-5]?[0-9]This will match numbers between 0 and 59 and can beminutes.(:[0-5]?[0-9])?This is same as above but made it optional by adding the actual pattern inside ‘()’ and appending a?. This is used to matchseconds(if any)\s([A-Za-z]{3})?This group is again an optional one. It’ll match time zone specifications (if any).

\s – stands for any single space character. (spaceor tab)

Restof themisself-explanatory.

 5)    Check password strength

 This example is for matching passwords with complexity. (This isofa little bit Perl flavors, so when you use it withgrepuse the -P option.)

1)    8 or more characters

2)    Contains at least one number or a special character

3)    Contains at least one small letter

4)    Contains at least one capital letter

(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$

Example:

 How it works:

(?=^.{8,}$) This block will not match anything but checks whether the current string is 8 characters or more.

?= is a quantifier (positive look ahead) which checks whether the following expression is satisfied in the matching string. If and only ifrestof theregexwould be validated.

“.”standsfor any character

{8,} stands for 8 or more.

$ denotes the end ofline((?=.*\d)|(?=.*\W+))Here “.*\d”checks anythingup to a digit (\d) OR (|) up to a non-word character (\W)(?![.\n])?!isa special quantifier(negative lookahead) which fails if preceding characters appears in the string.

Here if a string contains a dot (.) or new line character (\n) it is invalidated.(?=.*[A-Z])(?=.*[a-z])These two positive look ahead will ensure that the given stringcontainat least one small letter [a-z] and a capital letter [A-Z].*$This is the actual matching pattern. IF every quantifiers before it satisfies the string this will match every character (.*) in the string up to the end ofline($)