Introduction
Regular Expressions (called REs, or regexes, or regex patterns) are used to extract patterns from a given text.
They are language independent. Regular expressions are widely used in applications that require some of the
use cases mentioned below but not restricted to it.
- To find the given pattern in a text i.e like find, replace a word etc. in word application.
- To find or extract all phone numbers or emails from a given text file.
- It can also be used to extract or scrape information from the given website through web scraping.
- It can also be used for validations like
- Is it a valid email id or not.
- Is it a valid phone number or mobile number or not
- It can be used for any kind of text analytics
- Natural Language Processing
For Free, Demo classes Call: 7507414653
Registration Link:
Regular Expressions can be used for any type of pattern matching or pattern extraction from a given text.
Regular expressions are widely used in applications that require to do the above-mentioned use cases.
Simple Patterns
We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to
operate on strings, we’ll begin with the most common task: matching characters.
Matching Characters
Most letters and characters will simply match themselves. For example, the regular expression Python will
match the string hello exactly. (If case-insensitive mode is set that would allow this RE match Python or
PYTHON as well; we shall discuss more about this later.)
For Free, Demo classes Call: 7507414653
Registration Link:
Note :
However there are exceptions to this rule; some characters are special metacharacters, and don’t match
themselves.
Here’s a complete list of the metacharacters; their meanings will be discussed below
. ^ $ * + ? { } [ ] \ | ( )
Implementation in Python
Step 1:
Import the re module
In [1]: import re
For Free, Demo classes Call: 7507414653
Registration Link:
Step 2:
Create an regex object using compile method of re module and pass the pattern we want to match as the
argument to compile method of re module as shown below.
In [2]: pattern = re.compile(‘Python’)
To check the type of pattern i.e an object, we use the following code
In [3]: type(pattern)
From the above code, we can tell that it is an object of re. pattern i.e a regex object.
Step 3:
Once we have implemented step 2, we can go for creating the callable iterator object as shown below
In [4]: matches = pattern.finditer(‘Welcome to Python Regular Expression.’)
In the finite method pass the target string as an argument from which we have to search the pattern Python
mentioned in Step 1
Let’s now check the type of matches
In [5]: type(matches)
As we can see it is an object of type callable_iterator. Now we can iterate over it using a for loop.
In [6]: for match in matches:
For Free, Demo classes Call: 7507414653
Registration Link:
print(‘Match found at’,’start Index:’,match.start(),’–>’,’End Index:’,mat
ch.end(),’–>’,’Match Pattern:’,match.group())
Here
start() –> Returns the start of the match pattern
end() –> Returns the end index + 1 of the match pattern
group() –> Returns the matched pattern
Now, Let’s check the type of it
In [7]: type(match)
For Free, Demo classes Call: 7507414653
Registration Link:
As we can see it is of type Match Object.
Out[3]: re.Pattern
Out[5]: callable_iterator
Match found at start Index: 11 –> End Index: 17 –> Match Pattern: Python
Out[7]: re.Match
Let’s look at another example
we have a text string and we want to match a particular string from it. It can be done as given below
In [8]: target = “””Regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine is written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and write the RE in a certain way in order to produce bytecode that runs faster. Optimization isn’t covered in this document, because it requires that you have a good understanding of the matching engine’s internals. The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that can be done with reg lar expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable.”””
In [9]: # Lets do text analytics of counting how many times “regular expression” word appears in the target string
For Free, Demo classes Call: 7507414653
Registration Link:
#Step 1
import re
#Step 2 — lets set pattern to match “regular expression” from the target string pattern = re.compile(‘regular expression’)
#Step 3 — create an match object of case insensitive matcher = pattern.finditer(target,re.IGNORECASE)
In [10]: # Now lets iterate through it to find the match
count = 0 for the match in matcher:
count +=1
print(f’Match of {match.group()} is available at start index: ‘,match.star
t())
print(f”Total {match.group()} found is: {count}”)
Hence we have a total of 4 occurrences of “regular expression” in the given target of strings
Match of the regular expression is available at start index: 458
Match of a regular expression is available at start index: 585
Match of a regular expression is available at start index: 649
Match of a regular expression is available at start index: 856
The total regular expression found is: 4
For Free, Demo classes Call: 7507414653
Registration Link:
Conclusion
Let’s recap what we have learned so far in this article.
If you want to represent a group of strings according to a particular pattern then you should go for Regular
Expressions
Steps to follow
Step 1: import re –> Importing the module
Step 2: pattern = re.compile(‘Python’) –> Creating an regex object
Step 3: matcher = pattern.finditer(‘Welcome to Python Regular Expression’) –> Creating an Match object
After you have completed the above three steps:
iterate through the callable_iterator i.e matcher in our case using a for loop and extract its
start index –> by calling start() method
end index –> by calling end() method
For Free, Demo classes Call: 7507414653
Registration Link:
match –> by calling group() methStay tuned for the next series i.e part 2, Where we shall discuss further more Regular Expression patterns to
extract using i.e making use of meta characters discussed above like
- character classes
- predefined character classes
- quantifiers and
- important methods of re module
Author:
Titus Newton | SevenMentor Pvt Ltd.
Call the Trainer and Book your free demo Class for now!!!