Data extraction using Regular expressions

Previous
Next

Set of digits extraction:

import re
line = "abc123xyz34amn56xyz890"
arr = re.findall(r'\d{3}' , line)
print("3 digits set : ",arr)

Extracting date from input:

import re
line = "abc21-03-1995xyz23-09-2001mnxyz14-08-97"
arr = re.findall(r'\d{2}-\d{2}-\d{4}' , line)
print(arr)

Extract Mail IDs:

import re
line = "abc@gmail.com xyz@yahoo.in lmn@gmail.com"
arr = re.findall(r'@\w+' , line)
print(arr)

Only domain names excluding @:

import re
line = "abc@gmail.com xyz@yahoo.in lmn@gmail.com"
arr = re.findall(r'@(\w+)' , line)
print(arr)
print("Gmail count is :",arr.count('gmail'))

Only domain types:

import re
line = "abc@gmail.com xyz@yahoo.in lmn@gmail.com"
arr = re.findall(r'@\w+.(\w+)' , line)
print(arr)

Any five letter string starting with ‘a’ and end with ‘s’:

import re
line = "abhis abyas abs alias abacus"

arr = re.split(r' ', line)
for word in arr:
    res = re.findall(r'^a...s$', word)
    if res:
        print(word,": match")
    else:
        print(word,": not match")

Output:

abhis : match
abyas : match
abs : not match
alias : match
abacus : not match

findall() returns a list of strings containing all matches:

import re
line = "amar 23 annie 21 hareen 5 satya 15"
pattern = "\d+"
arr = re.findall(pattern, line)
print(arr)

split():

import re
line = "amar 23 annie 21 hareen 5 satya 15"
pattern = "\d+"
arr = re.split(pattern, line)
print(arr)

Validate ATM pin number – exactly 4 digits.

pin = input("Enter PIN :")
if(len(pin)==4):
    if(pin.isdigit()):
        print("Valid PIN")
    else:
        print("Invalid PIN")
else:
    print("Invalid PIN")

Using regular expression validate pin:

import re
pin = input("Enter PIN :")
if len(pin)==4:
    valid = re.findall(r'\d{4}',pin)
    if valid:
        print("Valid pin")
    else:
        print("Invalid pin")
else:
    print("Invalid pin")

Mobile number of 10 digits starts with 7 or 8 or 9

import re
mob = input("Enter mobile number : ")
found = re.match(r'[789]\d{9}$', mob)
if found:
    print("Valid mobile number")
else:
    print("Invalid mobile number")

10 digits mobile number:

import re
num = input("Enter Number : ")
pattern = re.compile(r'^[7-9]\d{9}$')
if pattern.match(num):
    print('Valid mobile number')
else:
    print('Invalid mobile number')

Return all words of a string those starts with vowel

import re
result=re.findall(r'[aeiouAEIOU]\w+','India is My country')
print(result)
Previous
Next

Add Comment

Courses Enquiry Form