Python RegEx

Previous Page Python JSON
Next Page Python PIP

RegEx, or regular expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains a specified search pattern.

RegEx

Python provides a module named re The built-in package, can be used to handle regular expressions.

Import re Module:

import re

RegEx in Python

Import re After the module, you can start using regular expressions:

Example

Retrieve the string to see if it starts with "China" and ends with "country":

import re
txt = "China is a great country"
x = re.search("^China.*country$", txt)

Run Instance

RegEx functions

re The module provides a set of functions that allow us to retrieve strings for matching:

function	Description
findall	Returns a list containing all matches
search	If there is a match at any position in the string, returns a Match object
split	Returns a list of strings split at each match
sub	Replace one or more matches with a string

Metacharacters

Metacharacters are characters that have special meanings:

Character	Description	Example	TIY
[]	A group of characters	"[a-m]"	Try it
\	Illustrates special sequences (can also be used to escape special characters)	"\d"	Try it
.	Any character (except newline)	"he..o"	Try it
^	Starts with	"^hello"	Try it
$	Ends with	"world$"	Try it
*	Zero or more occurrences	"aix*"	Try it
+	One or more occurrences	"aix+"	Try it
{}	Exactly specified number of occurrences	"al{2}"	Try it
\|	either of the two	"falls\|stays"	Try it
()	Capture and grouping

Special sequence

A special sequence is \ followed by any character from the following table has a special meaning:

Character	Description	Example	TIY
\A	Returns a match if the specified character is at the beginning of the string	"\AThe"	Try it
\b	Returns a match where the specified character is at the beginning or end of a word	r"\bain" r"rain\b"	Try it Try it
\B	Returns a match where the specified character exists but is not at the beginning (or end) of a word	r"\Bain" r"rain\B"	Try it Try it
\d	Returns a match where the string contains any digits (0-9)	"\d"	Try it
\D	Returns a match where the string does not contain any digits	"\D"	Try it
\s	Returns a match where the string contains any whitespace characters	"\s"	Try it
\S	Returns a match where the string does not contain any whitespace characters	"\S"	Try it
\w	Returns a match where the string contains any word characters (Characters from a to Z, digits from 0 to 9, and the underscore _ character)	"\w"	Try it
\W	Returns a match where the string does not contain any word characters	"\W"	Try it
\Z	Returns a match if the specified character is at the end of the string	"Spain\Z"	Try it

Set (Set)

A set (Set) is a pair of square brackets [] A group of characters within, with special meanings:

Set	Description	Try it
[arn]	Return a match that contains any of the specified characters (a, r, or n)	Try it
[a-n]	Return matches for any lowercase character between a and n	Try it
[^arn]	Return matches for any character except a, r, and n	Try it
[0123]	Return matches that contain any of the specified digits (0, 1, 2, or 3)	Try it
[0-9]	Return matches for any digit between 0 and 9	Try it
[0-5][0-9]	Return matches for any digit between 0 and 9	Try it
[a-zA-Z]	Return matches for any character between a and z, either lowercase or uppercase	Try it
[+]	In the set, +, *, ., \|, (), $, {} do not have special meaning, so [+] means: return matches for any + character in the string	Try it

findall() function

findall() The function returns a list containing all matches.

Example

Print the list of all matches:

import re
str = "China is a great country"
x = re.findall("a", str)
print(x)

Run Instance

This list contains the matches found in order.

If no match items are found, return an empty list:

Example

If no match is found, return an empty list:

import re
str = "China is a great country"
x = re.findall("USA", str)
print(x)

Run Instance

search() function

search() The function searches for a match in the string and returns a Match object if there is a match.

If there are multiple matches, only the first match is returned:

Example

Search for the first white-space character in the string:

import re
str = "China is a great country"
x = re.search("\s", str)
print("The first white-space character is located in position:", x.start())

Run Instance

If no match is found, the returned value None:

Example

Perform a search that does not return the match:

import re
str = "China is a great country"
x = re.search("USA", str)
print(x)

Run Instance

split() function

split() The function returns a list where the string is split at each match:

Example

Split at each whitespace character:

import re
str = "China is a great country"
x = re.split("\s", str)
print(x)

Run Instance

You can specify maxsplit Parameter to control the number of occurrences:

Example

Split the string only at the first occurrence:

import re
str = "China is a great country"
x = re.split("\s", str, 1)
print(x)

Run Instance

sub() function

sub() The function replaces the match with the text you choose:

Example

Replace each whitespace character with the number 9:

import re
str = "China is a great country"
x = re.sub("\s", "9", str)
print(x)

Run Instance

You can specify count Parameter to control the number of replacements:

Example

Replace the first two occurrences:

import re
str = "China is a great country"
x = re.sub("\s", "9", str, 2)
print(x)

Run Instance

Match Object

The Match object is an object that contains information about the search and results.

Note:If no match is found, the return value is Noneinstead of Match object.

Example

Execution will return the search of the Match object:

import re
str = "China is a great country"
x = re.search("a", str)
print(x) # This will print an object

Run Instance

The Match object provides attributes and methods for retrieving information about the search and results:

span() The returned tuple contains the start and end positions of the match
.string Return the string passed to the function
group() Return the matched string part

Example

Print the position of the first match (start and end position).

Regular expression to find any word starting with uppercase "C":

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.span())

Run Instance

Example

Print the string passed to the function:

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.string)

Run Instance

Example

Print the matched string part.

Regular expression to find any word starting with uppercase "C":

import re
str = "China is a great country"
x = re.search(r"\bC\w+", str)
print(x.group())

Run Instance

Note:If no match is found, the return value is Noneinstead of Match object.

Previous Page Python JSON
Next Page Python PIP

Course Schedule

Python Tutorial

File Processing

Python NumPy

Machine Learning

Python MySQL

Python MongoDB

Python Reference Manual

Module reference manual

Python How To

Python Example

Elective Course

Course Recommendation:

Python RegEx

RegEx

RegEx in Python

Example

RegEx functions

Metacharacters

Special sequence

Set (Set)

findall() function

Example

Example

search() function

Example

Example

split() function

Example

Example

sub() function

Example

Example

Match Object

Example

Example

Example

Example

Toolbox

Python Reference Manual

Python Example

Python Quiz

Sponsor Links