Python RegEx
- Previous Page Python JSON
- Next Page Python PIP
RegEx, or regular expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains a specified search pattern.
RegEx
Python provides a module named re
The built-in package, can be used to handle regular expressions.
Import re
Module:
import re
RegEx in Python
Import re
After the module, you can start using regular expressions:
Example
Retrieve the string to see if it starts with "China" and ends with "country":
import re txt = "China is a great country" x = re.search("^China.*country$", txt)
RegEx functions
re
The module provides a set of functions that allow us to retrieve strings for matching:
function | Description |
---|---|
findall | Returns a list containing all matches |
search | If there is a match at any position in the string, returns a Match object |
split | Returns a list of strings split at each match |
sub | Replace one or more matches with a string |
Metacharacters
Metacharacters are characters that have special meanings:
Character | Description | Example | TIY |
---|---|---|---|
[] | A group of characters | "[a-m]" | Try it |
\ | Illustrates special sequences (can also be used to escape special characters) | "\d" | Try it |
. | Any character (except newline) | "he..o" | Try it |
^ | Starts with | "^hello" | Try it |
$ | Ends with | "world$" | Try it |
* | Zero or more occurrences | "aix*" | Try it |
+ | One or more occurrences | "aix+" | Try it |
{} | Exactly specified number of occurrences | "al{2}" | Try it |
| | either of the two | "falls|stays" | Try it |
() | Capture and grouping |
Special sequence
A special sequence is \
followed by any character from the following table has a special meaning:
Character | Description | Example | TIY |
---|---|---|---|
\A | Returns a match if the specified character is at the beginning of the string | "\AThe" | Try it |
\b | Returns a match where the specified character is at the beginning or end of a word |
r"\bain" r"rain\b" |
Try it Try it |
\B | Returns a match where the specified character exists but is not at the beginning (or end) of a word |
r"\Bain" r"rain\B" |
Try it Try it |
\d | Returns a match where the string contains any digits (0-9) | "\d" | Try it |
\D | Returns a match where the string does not contain any digits | "\D" | Try it |
\s | Returns a match where the string contains any whitespace characters | "\s" | Try it |
\S | Returns a match where the string does not contain any whitespace characters | "\S" | Try it |
\w | Returns a match where the string contains any word characters (Characters from a to Z, digits from 0 to 9, and the underscore _ character) |
"\w" | Try it |
\W | Returns a match where the string does not contain any word characters | "\W" | Try it |
\Z | Returns a match if the specified character is at the end of the string | "Spain\Z" | Try it |
Set (Set)
A set (Set) is a pair of square brackets []
A group of characters within, with special meanings:
Set | Description | Try it |
---|---|---|
[arn] | Return a match that contains any of the specified characters (a, r, or n) | Try it |
[a-n] | Return matches for any lowercase character between a and n | Try it |
[^arn] | Return matches for any character except a, r, and n | Try it |
[0123] | Return matches that contain any of the specified digits (0, 1, 2, or 3) | Try it |
[0-9] | Return matches for any digit between 0 and 9 | Try it |
[0-5][0-9] | Return matches for any digit between 0 and 9 | Try it |
[a-zA-Z] | Return matches for any character between a and z, either lowercase or uppercase | Try it |
[+] | In the set, +, *, ., |, (), $, {} do not have special meaning, so [+] means: return matches for any + character in the string | Try it |
findall() function
findall()
The function returns a list containing all matches.
Example
Print the list of all matches:
import re str = "China is a great country" x = re.findall("a", str) print(x)
This list contains the matches found in order.
If no match items are found, return an empty list:
Example
If no match is found, return an empty list:
import re str = "China is a great country" x = re.findall("USA", str) print(x)
search() function
search()
The function searches for a match in the string and returns a Match object if there is a match.
If there are multiple matches, only the first match is returned:
Example
Search for the first white-space character in the string:
import re str = "China is a great country" x = re.search("\s", str) print("The first white-space character is located in position:", x.start())
If no match is found, the returned value None
:
Example
Perform a search that does not return the match:
import re str = "China is a great country" x = re.search("USA", str) print(x)
split() function
split()
The function returns a list where the string is split at each match:
Example
Split at each whitespace character:
import re str = "China is a great country" x = re.split("\s", str) print(x)
You can specify maxsplit
Parameter to control the number of occurrences:
Example
Split the string only at the first occurrence:
import re str = "China is a great country" x = re.split("\s", str, 1) print(x)
sub() function
sub()
The function replaces the match with the text you choose:
Example
Replace each whitespace character with the number 9:
import re str = "China is a great country" x = re.sub("\s", "9", str) print(x)
You can specify count
Parameter to control the number of replacements:
Example
Replace the first two occurrences:
import re str = "China is a great country" x = re.sub("\s", "9", str, 2) print(x)
Match Object
The Match object is an object that contains information about the search and results.
Note:If no match is found, the return value is None
instead of Match object.
Example
Execution will return the search of the Match object:
import re str = "China is a great country" x = re.search("a", str) print(x) # This will print an object
The Match object provides attributes and methods for retrieving information about the search and results:
span()
The returned tuple contains the start and end positions of the match.string
Return the string passed to the functiongroup()
Return the matched string part
Example
Print the position of the first match (start and end position).
Regular expression to find any word starting with uppercase "C":
import re str = "China is a great country" x = re.search(r"\bC\w+", str) print(x.span())
Example
Print the string passed to the function:
import re str = "China is a great country" x = re.search(r"\bC\w+", str) print(x.string)
Example
Print the matched string part.
Regular expression to find any word starting with uppercase "C":
import re str = "China is a great country" x = re.search(r"\bC\w+", str) print(x.group())
Note:If no match is found, the return value is None
instead of Match object.
- Previous Page Python JSON
- Next Page Python PIP