RegEx



Python - RegEx


A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.


RegEx Module :


Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

import re

RegEx in Python :


When you have imported the re module, you can start using regular expressions:


Example
Search the string to see if it starts with "The" and ends with "Spain":

import re


txt = "The class in freedom"
x = re.search("^The.*freedom$", txt)

if (x):
  print("YES! We have a match!")
else:
  print("No match")

==========o/p===========

YES! We have a match!

RegEx Functions :


The re module offers a set of functions that allows us to search a string for a match:

FunctionDescription
findallReturns a list containing all matches
searchReturns a Match object if there is a match anywhere in the string
splitReturns a list where the string has been split at each match
subReplaces one or many matches with a string

Metacharacters :

Metacharacters are characters with a special meaning:

CharacterDescription
[]A set of characters
import re

str = "Class in Freedom"

#Find all lower case characters alphabetically between "a" and "z":

x = re.findall("[a-z]", str)
print(x)


===========o/p============

['l', 'a', 's', 's', 'i', 'n', 'r', 'e', 'e', 'd', 'o', 'm']


CharacterDescription
\Signals a special sequence (can also be used to escape special characters)
import re

str = "tutuorial was started in 12-Nov-2018"

#Find all digit characters

x = re.findall("\d", str)
print(x)


=========o/p==========

['1', '2', '2', '0', '1', '8']


CharacterDescription
.Any character (except newline character)
import re

str = "Freedom"

##Search for a sequence that starts with "Fr", followed by two (any) characters, and an "dom":

x = re.findall("Fr..dom", str)
print(x)

==========o/p==========
['Freedom']


CharacterDescription
^Starts with
import re

str = "hello Freedom"

#Check if the string starts with 'hello':

x = re.findall("^hello", str)
if (x):
  print("Yes, the string starts with 'hello'")
else:
  print("No match")

=========o/p=========

Yes, the string starts with 'hello'


CharacterDescription
$Ends with
import re

str = "hello Freedom"

#Check if the string ends with 'Freedom':

x = re.findall("Freedom$", str)
if (x):
  print("Yes, the string ends with 'Freedom'")
else:
  print("No match")


==========o/p===========

Yes, the string ends with 'Freedom'


CharacterDescription
*Zero or more occurrences
import re
str="she sells sea shells on the sea shore"
x=re.findall("shx*",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
==========o/p============

yes,there is atleast one match


CharacterDescription
+One or more occurrences
import re
str="she sells sea shells on the sea shore"
x=re.findall("shx+",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p========

no match


CharacterDescription
{}Exactly the specified number of occurrences
import re
str="she sells sea shells on the sea shore"

#Check if the string contains "e" followed by exactly two "l" characters:

x=re.findall("el{2}",str)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p==========

yes,there is atleast one match


CharacterDescription
straight line is a symbol Either or
import re
str="she sells sea shells on the sea shore"

#Check if the string contains either "sells" or "solds":

x=re.findall("sells|solds",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

==========o/p===========

['sells']
yes,there is atleast one match

CharacterDescription
()Capture and group

Special Sequences :


A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

CharacterDescription
\AReturns a match if the specified characters are at the beginning of the string
import re
str="she sells sea shells on the sea shore"

#Check if the string starts with "she"

x=re.findall("\Ashe",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p==========

['she']
yes,there is atleast one match




CharacterDescription
\bReturns a match where the specified characters are at the beginning or at the end of a word
#Checking the specified characters beginning of a WORD:

import re
str="she sells sea shells on the sea shore"

#Check if "ore" is present at the beginning of a word

x=re.findall(r"\bore",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========o/p============

[]
no match

____________________________________________________________________________
#checking the specified characters at the end

import re
str="she sells sea shells on the sea shore"

#Check if "ore" is present at the end of a word

x=re.findall(r"ore\b",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
=========o/p============

['ore']
yes,there is atleast one match


CharacterDescription
\BReturns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
import re
str="she sells sea shells on the sea shore"

#check if "s" is present,but not at the "beginning" of a word:

x=re.findall(r"\bs",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
============o/p==============

['s', 's', 's', 's', 's', 's']
yes,there is atleast one match

__________________________________________________________________

import re
str="she sells sea shells on the sea shore"

#check if "sea" is present,but not at the "end" of a word:

x=re.findall(r"sea\B",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
===========o/p============

[]
no match




CharacterDescription
\dReturns a match where the string contains digits (numbers from 0-9)
import re
str="she sells sea shells on the sea shore"

#check if string contains any digits (numbers from 0-9):


x=re.findall("\d",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========o/p=============
[]
no match



CharacterDescription
\DReturns a match where the string DOES NOT contain digits
import re
str="she sells sea shells on the sea shore"

#check if string DOES NOT contains any digits (numbers from 0-9):


x=re.findall("\D",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  

==========O/P============

['s', 'h', 'e', ' ', 's', 'e', 'l', 'l', 's', ' ', 's', 'e', 'a', ' ', 's', 'h', 'e', 'l', 'l', 's', ' ', 'o', 'n', ' ', 't', 'h', 'e', ' ', 's', 'e', 'a', ' ', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match


CharacterDescription
\sReturns a match where the string contains a white space character
import re
str="she sells sea shells on the sea shore"

#Return a match at every white-space character:


x=re.findall("\s",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
===========o/p===========

[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match


CharacterDescription
\SReturns a match where the string DOES NOT contain a white space character
import re
str="she sells sea shells on the sea shore"

#Returns a match where the string DOES NOT contain a white space character


x=re.findall("\S",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

===========O/P=============

['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match

CharacterDescription
\wReturns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
import re
str="she sells sea shells on the sea shore"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):


x=re.findall("\w",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")


===========o/p============

['s', 'h', 'e', 's', 'e', 'l', 'l', 's', 's', 'e', 'a', 's', 'h', 'e', 'l', 'l', 's', 'o', 'n', 't', 'h', 'e', 's', 'e', 'a', 's', 'h', 'o', 'r', 'e']
yes,there is atleast one match

CharacterDescription
\WReturns a match where the string DOES NOT contain any word characters
import re
str="she sells sea shells on the sea shore"

#Returns a match where the string DOES NOT contain any word characters

x=re.findall("\W",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")
  
==========O/P============

[' ', ' ', ' ', ' ', ' ', ' ', ' ']
yes,there is atleast one match

import re
str="she sells sea shells on the sea shore"

#Check if the string ends with "shore":

x=re.findall("shore\Z",str)
print(x)
if(x):
  print("yes,there is atleast one match")
else:
  print("no match")

==============o/p==================


  ['shore']
yes,there is atleast one match

The findall() Function :


The findall() function returns a list containing all matches.


import re
str="she sells sea shells on the sea shore"

#The findall() function returns a list containing all matches.


x=re.findall("ea",str)
print(x)

========o/p=========

['ea', 'ea']

The list contains the matches in the order they are found.

If no matches are found, an empty list is returned:

Return an empty list if no match was found:

import re
str="she sells sea shells on the sea shore"
x=re.findall("portugal",str)
print(x)

=========o/p==========

[]

The search() Function :


The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:


import re
str="she sells sea shells on the sea shore"
x=re.search("sea",str)
print(x)
  
=========o/p==========

<_sre.SRE_Match object; span=(10, 13), match='sea'>

If no matches are found, the value None is returned:


The split() Function :


The split() function returns a list where the string has been split at each match.


Split at each white-space character:

import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str)
print(x)
  
=========o/p=========

['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea', 'shore']


You can control the number of occurrences by specifying the 'maxsplit' parameter:

import re
str="she sells sea shells on the sea shore"
x=re.split("\s",str,1)
print(x)

==========o/p===========

['she', 'sells sea shells on the sea shore']
  

The sub() Function :


The sub() function replaces the matches with the text of your choice.


Replace every white-space character with the number 5:

import re
str="she sells sea shells on the sea shore"
x=re.sub("\s","5",str)
print(x)

=========o/p=========

she5sells5sea5shells5on5the5sea5shore


You can control the number of replacements by specifying the count parameter:

import re
str="she sells sea shells on the sea shore"

#Replace the first two occurrences of a white-space character with the digit 5:

x=re.sub("\s","5",str,3)
print(x)

==========o/p============

she5sells5sea5shells on the sea shore


Match Object:


A Match Object is an object containing information about the search and the result.


Do a search that will return a Match Object:

import re
str="she sells sea shells on the sea shore"

#The search() function returns a Match object:
x=re.search("ea",str)
print(x)

=========o/p==========

<_sre.SRE_Match object; span=(11, 13), match='ea'>


The Match object has properties and methods used to retrieve information about the search, and the result:


  • span() - returns a tuple containing the start-, and end positions of the match.

  • string - returns the string passed into the function

  • group() - returns the part of the string where there was a match


Example
Print the position (start- and end-position) of the first match occurrence.
The regular expression looks for any words that starts with an upper case "S":


import re

#Search for an upper case "S" character in the beginning of a word, and print its position:

str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.span())
  
==========O/P==========

(0, 3)


Print the string passed into the function:

import re

#The string property returns the search string:

str="She sells sea shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.string)
  
=========o/p===========

She sells sea shells on the sea shore


Example
Print the part of the string where there was a match.
The regular expression looks for any words that starts with an upper case "S":

import re

#Search for an upper case "S" character in the beginning of a word, and print the word:
#no matter how many words starts with upper case it prints only beginning one

str="She Sells Sea Shells on the sea shore"
x=re.search(r"\bS\w+", str)
print(x.group())
  
==========o/p===========

She

Note: If there is no match, the value 'None' will be returned, instead of the Match Object.



RegEx SETS

Python - RegEx SETS

posted on 2019-11-12 23:06:24 - Python Tutorials


RegEx_Functions

Python - RegEx_Functions

posted on 2019-11-09 06:07:29 - Python Tutorials


RegEx_Sets

Python - RegEx_Sets

posted on 2019-11-09 05:30:54 - Python Tutorials


Prompt Examples

ChatGPT Prompt Examples

posted on 2023-06-21 22:37:19 - ChatGPT Tutorials


Use Cases

Chat GPT Key Use Cases

posted on 2023-06-21 21:03:17 - ChatGPT Tutorials


Prompt Frameworks

Prompt Frameworks

posted on 2023-06-21 19:33:06 - ChatGPT Tutorials