RegEx

Python - RegEx

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

RegEx Module :

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

import re

RegEx in Python :

When you have imported the re module, you can start using regular expressions:

Example
Search the string to see if it starts with "The" and ends with "Spain":

import re


txt = "The class in freedom"
x = re.search("^The.*freedom$", txt)

if (x):
  print("YES! We have a match!")
else:
  print("No match")

==========o/p===========

YES! We have a match!

RegEx Functions :

The re module offers a set of functions that allows us to search a string for a match:

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

Metacharacters :

Metacharacters are characters with a special meaning:

Character	Description
[]	A set of characters

import re

str = "Class in Freedom"

#Find all lower case characters alphabetically between "a" and "z":

x = re.findall("[a-z]", str)
print(x)


===========o/p============

['l', 'a', 's', 's', 'i', 'n', 'r', 'e', 'e', 'd', 'o', 'm']

Character	Description
\	Signals a special sequence (can also be used to escape special characters)

import re

str = "tutuorial was started in 12-Nov-2018"

#Find all digit characters

x = re.findall("\d", str)
print(x)


=========o/p==========

['1', '2', '2', '0', '1', '8']

Character	Description
.	Any character (except newline character)

import re

str = "Freedom"

##Search for a sequence that starts with "Fr", followed by two (any) characters, and an "dom":

x = re.findall("Fr..dom", str)
print(x)

==========o/p==========
['Freedom']

Character	Description
^	Starts with

import re

str = "hello Freedom"

#Check if the string starts with 'hello':

x = re.findall("^hello", str)
if (x):
  print("Yes, the string starts with 'hello'")
else:
  print("No match")

=========o/p=========

Yes, the string starts with 'hello'

Character	Description
$	Ends with

import re

str = "hello Freedom"

#Check if the string ends with 'Freedom':

x = re.findall("Freedom$", str)
if (x):
  print("Yes, the string ends with 'Freedom'")
else:
  print("No match")


==========o/p===========

Yes, the string ends with 'Freedom'

Character	Description
*	Zero or more occurrences

import re
str="she sells sea shells on the sea shore"
x=re.findall("shx*",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
==========o/p============

yes,there is atleast one match

Character	Description
+	One or more occurrences

import re
str="she sells sea shells on the sea shore"
x=re.findall("shx+",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p========

no match

Character	Description
{}	Exactly the specified number of occurrences

import re
str="she sells sea shells on the sea shore"

#Check if the string contains "e" followed by exactly two "l" characters:

x=re.findall("el{2}",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p==========

yes,there is atleast one match

Character	Description
straight line is a symbol	Either or

import re
str="she sells sea shells on the sea shore"

#Check if the string contains either "sells" or "solds":

x=re.findall("sells|solds",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

==========o/p===========

['sells']
yes,there is atleast one match

Character	Description
()	Capture and group

Special Sequences :

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

Character	Description
\A	Returns a match if the specified characters are at the beginning of the string

import re
str="she sells sea shells on the sea shore"

#Check if the string starts with "she"

x=re.findall("\Ashe",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p==========

['she']
yes,there is atleast one match

Character	Description
\b	Returns a match where the specified characters are at the beginning or at the end of a word

#Checking the specified characters beginning of a WORD:

import re
str="she sells sea shells on the sea shore"

#Check if "ore" is present at the beginning of a word

x=re.findall(r"\bore",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========o/p============

[]
no match

____________________________________________________________________________
#checking the specified characters at the end

import re
str="she sells sea shells on the sea shore"

#Check if "ore" is present at the end of a word

x=re.findall(r"ore\b",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p============

['ore']
yes,there is atleast one match

Character	Description
\B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word

import re
str="she sells sea shells on the sea shore"

#check if "s" is present,but not at the "beginning" of a word:

x=re.findall(r"\bs",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
============o/p==============

['s', 's', 's', 's', 's', 's']
yes,there is atleast one match

__________________________________________________________________

import re
str="she sells sea shells on the sea shore"

#check if "sea" is present,but not at the "end" of a word:

x=re.findall(r"sea\B",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
===========o/p============

[]
no match

Character	Description
\d	Returns a match where the string contains digits (numbers from 0-9)

import re
str="she sells sea shells on the sea shore"

#check if string contains any digits (numbers from 0-9):


x=re.findall("\d",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========o/p=============
[]
no match

Character	Description
\D	Returns a match where the string DOES NOT contain digits

import re
str="she sells sea shells on the sea shore"

#check if string DOES NOT contains any digits (numbers from 0-9):


x=re.findall("\D",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========O/P============

['s', 'h', 'e', ' ', 's', 'e', 'l', 'l', 's', ' ', 's', 'e', 'a', ' ', 's', 'h', 'e', 'l', 'l', 's', ' ', 'o', 'n', ' ', 't', 'h', 'e', ' ', 's', 'e', 'a', ' ', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match

Character	Description
\s	Returns a match where the string contains a white space character

import re
str="she sells sea shells on the sea shore"

#Return a match at every white-space character:


x=re.findall("\s",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
===========o/p===========

[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match

Character	Description
\S	Returns a match where the string DOES NOT contain a white space character

import re
str="she sells sea shells on the sea shore"

#Returns a match where the string DOES NOT contain a white space character


x=re.findall("\S",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

===========O/P=============

['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match

Character	Description
\w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)

import re
str="she sells sea shells on the sea shore"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):


x=re.findall("\w",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")


===========o/p============

['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match

Character	Description
\W	Returns a match where the string DOES NOT contain any word characters

import re
str="she sells sea shells on the sea shore"

#Returns a match where the string DOES NOT contain any word characters

x=re.findall("\W",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
==========O/P============

[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match

import re
str="she sells sea shells on the sea shore"

#Check if the string ends with "shore":

x=re.findall("shore\Z",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

==============o/p==================


  ['shore']
yes,there is atleast one match

The findall() Function :

The findall() function returns a list containing all matches.

import re
str="she sells sea shells on the sea shore"

#The findall() function returns a list containing all matches.


x=re.findall("ea",str)
print(x)

========o/p=========

['ea', 'ea']

The list contains the matches in the order they are found.

If no matches are found, an empty list is returned:

Return an empty list if no match was found:

import re
str="she sells sea shells on the sea shore"
x=re.findall("portugal",str)
print(x)

=========o/p==========

[]

The search() Function :

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

import re
str="she sells sea shells on the sea shore"
x=re.search("sea",str)
print(x)
  
=========o/p==========

<_sre.SRE_Match object; span=(10, 13), match='sea'>

If no matches are found, the value None is returned:

The split() Function :

The split() function returns a list where the string has been split at each match.

Split at each white-space character:

import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str)
print(x)
  
=========o/p=========

['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea', 'shore']

You can control the number of occurrences by specifying the 'maxsplit' parameter:

import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str,1)
print(x)

==========o/p===========

['she', 'sells sea shells on the sea shore']

The sub() Function :

The sub() function replaces the matches with the text of your choice.

Replace every white-space character with the number 5:

import re
str="she sells sea shells on the sea shore"
x=re.sub("\s","5",str)
print(x)

=========o/p=========

she5sells5sea5shells5on5the5sea5shore

You can control the number of replacements by specifying the count parameter:

import re
str="she sells sea shells on the sea shore"

#Replace the first two occurrences of a white-space character with the digit 5:

x=re.sub("\s","5",str,3)
print(x)

==========o/p============

she5sells5sea5shells on the sea shore

Match Object:

A Match Object is an object containing information about the search and the result.

Do a search that will return a Match Object:

import re
str="she sells sea shells on the sea shore"

#The search() function returns a Match object:
x=re.search("ea",str)
print(x)

=========o/p==========

<_sre.SRE_Match object; span=(11, 13), match='ea'>

The Match object has properties and methods used to retrieve information about the search, and the result:

span() - returns a tuple containing the start-, and end positions of the match.

string - returns the string passed into the function

group() - returns the part of the string where there was a match

Example
Print the position (start- and end-position) of the first match occurrence.
The regular expression looks for any words that starts with an upper case "S":

import re

#Search for an upper case "S" character in the beginning of a word, and print its position:

str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.span())
  
==========O/P==========

(0, 3)

Print the string passed into the function:

import re

#The string property returns the search string:

str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.string)
  
=========o/p===========

She sells sea shells on the sea shore

Example
Print the part of the string where there was a match.
The regular expression looks for any words that starts with an upper case "S":

import re

#Search for an upper case "S" character in the beginning of a word, and print the word:
#no matter how many words starts with upper case it prints only beginning one

str="She Sells Sea Shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.group())
  
==========o/p===========

She

Note: If there is no match, the value 'None' will be returned, instead of the Match Object.

RegEx

RegEx Module :

RegEx in Python :

RegEx Functions :

Metacharacters :

Special Sequences :

The findall() Function :

The search() Function :

The split() Function :

The sub() Function :

Match Object:

RegEx SETS

RegEx_Functions

RegEx_Sets

Prompt Examples

Use Cases

Prompt Frameworks