Python - RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
Python has a built-in package called re, which can be used to work with Regular Expressions.
Import the re module:
import re
When you have imported the re module, you can start using regular expressions:
Example
Search the string to see if it starts with "The" and ends with "Spain":
import re
txt = "The class in freedom"
x = re.search("^The.*freedom$", txt)
if (x):
print("YES! We have a match!")
else:
print("No match")
==========o/p===========
YES! We have a match!
The re module offers a set of functions that allows us to search a string for a match:
Function | Description |
---|---|
findall | Returns a list containing all matches |
search | Returns a Match object if there is a match anywhere in the string |
split | Returns a list where the string has been split at each match |
sub | Replaces one or many matches with a string |
Metacharacters are characters with a special meaning:
Character | Description |
---|---|
[] | A set of characters |
import re
str = "Class in Freedom"
#Find all lower case characters alphabetically between "a" and "z":
x = re.findall("[a-z]", str)
print(x)
===========o/p============
['l', 'a', 's', 's', 'i', 'n', 'r', 'e', 'e', 'd', 'o', 'm']
Character | Description |
---|---|
\ | Signals a special sequence (can also be used to escape special characters) |
import re
str = "tutuorial was started in 12-Nov-2018"
#Find all digit characters
x = re.findall("\d", str)
print(x)
=========o/p==========
['1', '2', '2', '0', '1', '8']
Character | Description |
---|---|
. | Any character (except newline character) |
import re
str = "Freedom"
##Search for a sequence that starts with "Fr", followed by two (any) characters, and an "dom":
x = re.findall("Fr..dom", str)
print(x)
==========o/p==========
['Freedom']
Character | Description |
---|---|
^ | Starts with |
import re
str = "hello Freedom"
#Check if the string starts with 'hello':
x = re.findall("^hello", str)
if (x):
print("Yes, the string starts with 'hello'")
else:
print("No match")
=========o/p=========
Yes, the string starts with 'hello'
Character | Description |
---|---|
$ | Ends with |
import re
str = "hello Freedom"
#Check if the string ends with 'Freedom':
x = re.findall("Freedom$", str)
if (x):
print("Yes, the string ends with 'Freedom'")
else:
print("No match")
==========o/p===========
Yes, the string ends with 'Freedom'
Character | Description |
---|---|
* | Zero or more occurrences |
import re
str="she sells sea shells on the sea shore"
x=re.findall("shx*",str)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========o/p============
yes,there is atleast one match
Character | Description |
---|---|
+ | One or more occurrences |
import re
str="she sells sea shells on the sea shore"
x=re.findall("shx+",str)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
=========o/p========
no match
Character | Description |
---|---|
{} | Exactly the specified number of occurrences |
import re
str="she sells sea shells on the sea shore"
#Check if the string contains "e" followed by exactly two "l" characters:
x=re.findall("el{2}",str)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
=========o/p==========
yes,there is atleast one match
Character | Description |
---|---|
straight line is a symbol | Either or |
import re
str="she sells sea shells on the sea shore"
#Check if the string contains either "sells" or "solds":
x=re.findall("sells|solds",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========o/p===========
['sells']
yes,there is atleast one match
Character | Description |
---|---|
() | Capture and group |
A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:
Character | Description |
---|---|
\A | Returns a match if the specified characters are at the beginning of the string |
import re
str="she sells sea shells on the sea shore"
#Check if the string starts with "she"
x=re.findall("\Ashe",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
=========o/p==========
['she']
yes,there is atleast one match
Character | Description |
---|---|
\b | Returns a match where the specified characters are at the beginning or at the end of a word |
#Checking the specified characters beginning of a WORD:
import re
str="she sells sea shells on the sea shore"
#Check if "ore" is present at the beginning of a word
x=re.findall(r"\bore",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========o/p============
[]
no match
____________________________________________________________________________
#checking the specified characters at the end
import re
str="she sells sea shells on the sea shore"
#Check if "ore" is present at the end of a word
x=re.findall(r"ore\b",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
=========o/p============
['ore']
yes,there is atleast one match
Character | Description |
---|---|
\B | Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word |
import re
str="she sells sea shells on the sea shore"
#check if "s" is present,but not at the "beginning" of a word:
x=re.findall(r"\bs",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
============o/p==============
['s', 's', 's', 's', 's', 's']
yes,there is atleast one match
__________________________________________________________________
import re
str="she sells sea shells on the sea shore"
#check if "sea" is present,but not at the "end" of a word:
x=re.findall(r"sea\B",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
===========o/p============
[]
no match
Character | Description |
---|---|
\d | Returns a match where the string contains digits (numbers from 0-9) |
import re
str="she sells sea shells on the sea shore"
#check if string contains any digits (numbers from 0-9):
x=re.findall("\d",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========o/p=============
[]
no match
Character | Description |
---|---|
\D | Returns a match where the string DOES NOT contain digits |
import re
str="she sells sea shells on the sea shore"
#check if string DOES NOT contains any digits (numbers from 0-9):
x=re.findall("\D",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========O/P============
['s', 'h', 'e', ' ', 's', 'e', 'l', 'l', 's', ' ', 's', 'e', 'a', ' ', 's', 'h', 'e', 'l', 'l', 's', ' ', 'o', 'n', ' ', 't', 'h', 'e', ' ', 's', 'e', 'a', ' ', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match
Character | Description |
---|---|
\s | Returns a match where the string contains a white space character |
import re
str="she sells sea shells on the sea shore"
#Return a match at every white-space character:
x=re.findall("\s",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
===========o/p===========
[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match
Character | Description |
---|---|
\S | Returns a match where the string DOES NOT contain a white space character |
import re
str="she sells sea shells on the sea shore"
#Returns a match where the string DOES NOT contain a white space character
x=re.findall("\S",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
===========O/P=============
['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match
Character | Description |
---|---|
\w | Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) |
import re
str="she sells sea shells on the sea shore"
#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):
x=re.findall("\w",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
===========o/p============
['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match
Character | Description |
---|---|
\W | Returns a match where the string DOES NOT contain any word characters |
import re
str="she sells sea shells on the sea shore"
#Returns a match where the string DOES NOT contain any word characters
x=re.findall("\W",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==========O/P============
[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match
import re
str="she sells sea shells on the sea shore"
#Check if the string ends with "shore":
x=re.findall("shore\Z",str)
print(x)
if(x):
print("yes,there is atleast one match")
else:
print("no match")
==============o/p==================
['shore']
yes,there is atleast one match
The findall() function returns a list containing all matches.
import re
str="she sells sea shells on the sea shore"
#The findall() function returns a list containing all matches.
x=re.findall("ea",str)
print(x)
========o/p=========
['ea', 'ea']
The list contains the matches in the order they are found.
If no matches are found, an empty list is returned:
Return an empty list if no match was found:
import re
str="she sells sea shells on the sea shore"
x=re.findall("portugal",str)
print(x)
=========o/p==========
[]
The search() function searches the string for a match, and returns a Match object if there is a match.
If there is more than one match, only the first occurrence of the match will be returned:
import re
str="she sells sea shells on the sea shore"
x=re.search("sea",str)
print(x)
=========o/p==========
<_sre.SRE_Match object; span=(10, 13), match='sea'>
If no matches are found, the value None is returned:
The split() function returns a list where the string has been split at each match.
Split at each white-space character:
import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str)
print(x)
=========o/p=========
['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea', 'shore']
You can control the number of occurrences by specifying the 'maxsplit' parameter:
import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str,1)
print(x)
==========o/p===========
['she', 'sells sea shells on the sea shore']
The sub() function replaces the matches with the text of your choice.
Replace every white-space character with the number 5:
import re
str="she sells sea shells on the sea shore"
x=re.sub("\s","5",str)
print(x)
=========o/p=========
she5sells5sea5shells5on5the5sea5shore
You can control the number of replacements by specifying the count parameter:
import re
str="she sells sea shells on the sea shore"
#Replace the first two occurrences of a white-space character with the digit 5:
x=re.sub("\s","5",str,3)
print(x)
==========o/p============
she5sells5sea5shells on the sea shore
A Match Object is an object containing information about the search and the result.
Do a search that will return a Match Object:
import re
str="she sells sea shells on the sea shore"
#The search() function returns a Match object:
x=re.search("ea",str)
print(x)
=========o/p==========
<_sre.SRE_Match object; span=(11, 13), match='ea'>
The Match object has properties and methods used to retrieve information about the search, and the result:
Example
Print the position (start- and end-position) of the first match occurrence.
The regular expression looks for any words that starts with an upper case "S":
import re
#Search for an upper case "S" character in the beginning of a word, and print its position:
str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.span())
==========O/P==========
(0, 3)
Print the string passed into the function:
import re
#The string property returns the search string:
str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.string)
=========o/p===========
She sells sea shells on the sea shore
Example
Print the part of the string where there was a match.
The regular expression looks for any words that starts with an upper case "S":
import re
#Search for an upper case "S" character in the beginning of a word, and print the word:
#no matter how many words starts with upper case it prints only beginning one
str="She Sells Sea Shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.group())
==========o/p===========
She
Note: If there is no match, the value 'None' will be returned, instead of the Match Object.