9/11/21, 11:20 PM 14. Regular Expression [RegEx] - Part 1 - Jupyter Notebook
What are regular expressions?
The Regex or Regular Expression is a way to define a pattern for searching or manipulating strings. We can use a regular expression to match,
search, replace, and manipulate inside textual data.
Also, Regular expressions are instrumental in extracting information from text such as log files, spreadsheets, or even textual documents.
Example 1: Write a regular expression to search digit inside a string
In [1]: 1 import re
2 string = "My roll no. is 25"
3 a = re.findall(r"\d",string)
4 a
Out[1]: ['2', '5']
Understand this example
We imported the RE module into our program
Next, We created a regex pattern d to match any digit between 0 to 9.
After that, we used the re.findall() method to match our pattern.
In the end, we got two digits 2 and 5.
Use raw string to define a regex
Note: I have used a raw string to define a pattern like this r"d". Always write your regex as a raw string.
As you may already know, the backslash has a special meaning in some cases because it may indicate an escape character or escape
sequence. To avoid that always use a raw string.
Python regex methods
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 1/29
,9/11/21, 11:20 PM 14. Regular Expression [RegEx] - Part 1 - Jupyter Notebook
1. Compile Regex Pattern using re.compile()
We can compile a regular expression into a regex object to look for occurrences of the same pattern inside
various target strings without rewriting it.
How to compile regex pattern
1. Write regex pattern in string format
2. Write regex pattern using a raw string. For example, a pattern to match any digit.
str_pattern = r'\d'
3. Pass a pattern to the compile() method
pattern = re.compile(r'\d{3})
4. It compiles a regular expression pattern provided as a string into a regex pattern object.
5. Use Pattern object to match a regex pattern
6. Use Pattern object returned by the compile() method to match a regex pattern.
res = pattern.findall(target_string)
Example to compile a regular expression
Now, let’s see how to use the re.compile() with the help of a simple example.
Pattern to compile: r'\d{3}'
What does this pattern mean?
First of all, I used a raw string to specify the regular expression pattern.
Next, \d is a special sequence and it will match any digit from 0 to 9 in a target string.
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 2/29
, 9/11/21, 11:20 PM 14. Regular Expression [RegEx] - Part 1 - Jupyter Notebook
p q y g g g
Then the 3 inside curly braces mean the digit has to occur exactly three times in a row inside the target string.
In simple words, it means to match any three consecutive digits inside the target string such as 236 or 452, or 782.
In [2]: 1 # target string
2 str1 = " Deepali lucky numbers are 894 234 456 829"
3
4 #pattern to find 3 consecutive digits
5 str_pattern = r"\d{3}"
6
7 #compile str_pattern to re.pattern object
8 regex_pattern = re.compile(str_pattern)
9
10 #type of compile
11 print(type(regex_pattern))
12
13 # find all the matches in the string 1
14 result = regex_pattern.findall(str1)
15 print(result)
16
17 # target string 2
18 str2 = " Harsh lucky numbers are 678 645 234 097"
19
20 # find all the matches in second string by reusing the same pattern
21 res2 = regex_pattern.findall(str2)
22 print(res2)
23
<class 're.Pattern'>
['894', '234', '456', '829']
['678', '645', '234', '097']
As you can see, we found four matches of “three consecutive” digits inside the first string.
Note:
The re.compile() method changed the string pattern into a re.Pattern object that we can work upon.
Next, we used the re.Pattern object inside a re.findall() method to obtain all the possible matches of any three consecutive digits inside the
target string.
Now, the same reagex_pattern object can be used similarly for searching for three consecutive digits in other target strings as well.
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 3/29
What are regular expressions?
The Regex or Regular Expression is a way to define a pattern for searching or manipulating strings. We can use a regular expression to match,
search, replace, and manipulate inside textual data.
Also, Regular expressions are instrumental in extracting information from text such as log files, spreadsheets, or even textual documents.
Example 1: Write a regular expression to search digit inside a string
In [1]: 1 import re
2 string = "My roll no. is 25"
3 a = re.findall(r"\d",string)
4 a
Out[1]: ['2', '5']
Understand this example
We imported the RE module into our program
Next, We created a regex pattern d to match any digit between 0 to 9.
After that, we used the re.findall() method to match our pattern.
In the end, we got two digits 2 and 5.
Use raw string to define a regex
Note: I have used a raw string to define a pattern like this r"d". Always write your regex as a raw string.
As you may already know, the backslash has a special meaning in some cases because it may indicate an escape character or escape
sequence. To avoid that always use a raw string.
Python regex methods
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 1/29
,9/11/21, 11:20 PM 14. Regular Expression [RegEx] - Part 1 - Jupyter Notebook
1. Compile Regex Pattern using re.compile()
We can compile a regular expression into a regex object to look for occurrences of the same pattern inside
various target strings without rewriting it.
How to compile regex pattern
1. Write regex pattern in string format
2. Write regex pattern using a raw string. For example, a pattern to match any digit.
str_pattern = r'\d'
3. Pass a pattern to the compile() method
pattern = re.compile(r'\d{3})
4. It compiles a regular expression pattern provided as a string into a regex pattern object.
5. Use Pattern object to match a regex pattern
6. Use Pattern object returned by the compile() method to match a regex pattern.
res = pattern.findall(target_string)
Example to compile a regular expression
Now, let’s see how to use the re.compile() with the help of a simple example.
Pattern to compile: r'\d{3}'
What does this pattern mean?
First of all, I used a raw string to specify the regular expression pattern.
Next, \d is a special sequence and it will match any digit from 0 to 9 in a target string.
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 2/29
, 9/11/21, 11:20 PM 14. Regular Expression [RegEx] - Part 1 - Jupyter Notebook
p q y g g g
Then the 3 inside curly braces mean the digit has to occur exactly three times in a row inside the target string.
In simple words, it means to match any three consecutive digits inside the target string such as 236 or 452, or 782.
In [2]: 1 # target string
2 str1 = " Deepali lucky numbers are 894 234 456 829"
3
4 #pattern to find 3 consecutive digits
5 str_pattern = r"\d{3}"
6
7 #compile str_pattern to re.pattern object
8 regex_pattern = re.compile(str_pattern)
9
10 #type of compile
11 print(type(regex_pattern))
12
13 # find all the matches in the string 1
14 result = regex_pattern.findall(str1)
15 print(result)
16
17 # target string 2
18 str2 = " Harsh lucky numbers are 678 645 234 097"
19
20 # find all the matches in second string by reusing the same pattern
21 res2 = regex_pattern.findall(str2)
22 print(res2)
23
<class 're.Pattern'>
['894', '234', '456', '829']
['678', '645', '234', '097']
As you can see, we found four matches of “three consecutive” digits inside the first string.
Note:
The re.compile() method changed the string pattern into a re.Pattern object that we can work upon.
Next, we used the re.Pattern object inside a re.findall() method to obtain all the possible matches of any three consecutive digits inside the
target string.
Now, the same reagex_pattern object can be used similarly for searching for three consecutive digits in other target strings as well.
localhost:8888/notebooks/innomatics all notes/all python notes/14. Regular Expression %5BRegEx%5D - Part 1.ipynb 3/29