Python RegEx

RegEx, or regular expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains a specified search pattern.

RegEx

Python provides a module named re The built-in package, can be used to handle regular expressions.

Import re Module:

import re

RegEx in Python

Import re After the module, you can start using regular expressions:

Example

Retrieve the string to see if it starts with "China" and ends with "country":

import re
txt = "China is a great country"
x = re.search("^China.*country$", txt)

Run Instance

RegEx functions

re The module provides a set of functions that allow us to retrieve strings for matching:

function Description
findall Returns a list containing all matches
search If there is a match at any position in the string, returns a Match object
split Returns a list of strings split at each match
sub Replace one or more matches with a string

Metacharacters

Metacharacters are characters that have special meanings:

Character Description Example TIY
[] A group of characters "[a-m]" Try it
\ Illustrates special sequences (can also be used to escape special characters) "\d" Try it
. Any character (except newline) "he..o" Try it
^ Starts with "^hello" Try it
$ Ends with "world$" Try it
* Zero or more occurrences "aix*" Try it
+ One or more occurrences "aix+" Try it
{} Exactly specified number of occurrences "al{2}" Try it
| either of the two "falls|stays" Try it
() Capture and grouping

Special sequence

A special sequence is \ followed by any character from the following table has a special meaning:

Character Description Example TIY
\A Returns a match if the specified character is at the beginning of the string "\AThe" Try it
\b Returns a match where the specified character is at the beginning or end of a word r"\bain"
r"rain\b"
Try it
Try it
\B Returns a match where the specified character exists but is not at the beginning (or end) of a word r"\Bain"
r"rain\B"
Try it
Try it
\d Returns a match where the string contains any digits (0-9) "\d" Try it
\D Returns a match where the string does not contain any digits "\D" Try it
\s Returns a match where the string contains any whitespace characters "\s" Try it
\S Returns a match where the string does not contain any whitespace characters "\S" Try it
\w Returns a match where the string contains any word characters
(Characters from a to Z, digits from 0 to 9, and the underscore _ character)
"\w" Try it
\W Returns a match where the string does not contain any word characters "\W" Try it
\Z Returns a match if the specified character is at the end of the string "Spain\Z" Try it

Set (Set)

A set (Set) is a pair of square brackets [] A group of characters within, with special meanings:

Set Description Try it
[arn] Return a match that contains any of the specified characters (a, r, or n) Try it
[a-n] Return matches for any lowercase character between a and n Try it
[^arn] Return matches for any character except a, r, and n Try it
[0123] Return matches that contain any of the specified digits (0, 1, 2, or 3) Try it
[0-9] Return matches for any digit between 0 and 9 Try it
[0-5][0-9] Return matches for any digit between 0 and 9 Try it
[a-zA-Z] Return matches for any character between a and z, either lowercase or uppercase Try it
[+] In the set, +, *, ., |, (), $, {} do not have special meaning, so [+] means: return matches for any + character in the string Try it

findall() function

findall() The function returns a list containing all matches.

Example

Print the list of all matches:

import re
str = "China is a great country"
x = re.findall("a", str)
print(x)

Run Instance

This list contains the matches found in order.

If no match items are found, return an empty list:

Example

If no match is found, return an empty list:

import re
str = "China is a great country"
x = re.findall("USA", str)
print(x)

Run Instance

search() function

search() The function searches for a match in the string and returns a Match object if there is a match.

If there are multiple matches, only the first match is returned:

Example

Search for the first white-space character in the string:

import re
str = "China is a great country"
x = re.search("\s", str)
print("The first white-space character is located in position:", x.start())

Run Instance

If no match is found, the returned value None:

Example

Perform a search that does not return the match:

import re
str = "China is a great country"
x = re.search("USA", str)
print(x)

Run Instance

split() function

split() The function returns a list where the string is split at each match:

Example

Split at each whitespace character:

import re
str = "China is a great country"
x = re.split("\s", str)
print(x)

Run Instance

You can specify maxsplit Parameter to control the number of occurrences:

Example

Split the string only at the first occurrence:

import re
str = "China is a great country"
x = re.split("\s", str, 1)
print(x)

Run Instance

sub() function

sub() The function replaces the match with the text you choose:

Example

Replace each whitespace character with the number 9:

import re
str = "China is a great country"
x = re.sub("\s", "9", str)
print(x)

Run Instance

You can specify count Parameter to control the number of replacements:

Example

Replace the first two occurrences:

import re
str = "China is a great country"
x = re.sub("\s", "9", str, 2)
print(x)

Run Instance

Match Object

The Match object is an object that contains information about the search and results.

Note:If no match is found, the return value is Noneinstead of Match object.

Example

Execution will return the search of the Match object:

import re
str = "China is a great country"
x = re.search("a", str)
print(x) # This will print an object

Run Instance

The Match object provides attributes and methods for retrieving information about the search and results:

  • span() The returned tuple contains the start and end positions of the match
  • .string Return the string passed to the function
  • group() Return the matched string part

Example

Print the position of the first match (start and end position).

Regular expression to find any word starting with uppercase "C":

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.span())

Run Instance

Example

Print the string passed to the function:

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.string)

Run Instance

Example

Print the matched string part.

Regular expression to find any word starting with uppercase "C":

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.group())

Run Instance

Note:If no match is found, the return value is Noneinstead of Match object.