Python Regular Expressions Tutorial

Introduction

Regular expressions (regex) are used for matching patterns in text. They are commonly used for searching, extracting, and manipulating text. Python provides the re module, which allows you to work with regular expressions. This tutorial covers the basics of using regular expressions in Python, including common regex patterns, matching, searching, and replacing text.

Table of Contents

  1. Introduction to Regular Expressions
  2. Importing the re Module
  3. Basic Patterns
  4. Special Characters
  5. Using re.match()
  6. Using re.search()
  7. Using re.findall()
  8. Using re.finditer()
  9. Using re.sub()
  10. Using re.split()
  11. Compiling Regular Expressions
  12. Flags in Regular Expressions
  13. Practical Examples
  14. Conclusion

1. Introduction to Regular Expressions

Regular expressions are sequences of characters that define a search pattern. They are used for tasks like pattern matching, searching, and replacing text in strings.

2. Importing the re Module

To use regular expressions in Python, you need to import the re module.

Example

import re

3. Basic Patterns

Some basic patterns used in regular expressions are:

  • . : Matches any character except a newline.
  • ^ : Matches the start of the string.
  • $ : Matches the end of the string.
  • * : Matches 0 or more repetitions of the preceding character.
  • + : Matches 1 or more repetitions of the preceding character.
  • ? : Matches 0 or 1 repetition of the preceding character.
  • {n} : Matches exactly n repetitions of the preceding character.
  • {n,m} : Matches between n and m repetitions of the preceding character.

Example

pattern = r"ab*c"

4. Special Characters

Some special characters used in regular expressions are:

  • \d : Matches any digit (0-9).
  • \D : Matches any non-digit.
  • \w : Matches any alphanumeric character (a-z, A-Z, 0-9, _).
  • \W : Matches any non-alphanumeric character.
  • \s : Matches any whitespace character (space, tab, newline).
  • \S : Matches any non-whitespace character.

Example

pattern = r"\d{3}-\d{2}-\d{4}"

5. Using re.match()

The re.match() function attempts to match a pattern at the beginning of a string.

Example

import re

pattern = r"hello"
string = "hello world"

match = re.match(pattern, string)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

6. Using re.search()

The re.search() function searches for a pattern anywhere in the string.

Example

import re

pattern = r"world"
string = "hello world"

match = re.search(pattern, string)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

7. Using re.findall()

The re.findall() function returns a list of all non-overlapping matches of a pattern in a string.

Example

import re

pattern = r"\d+"
string = "There are 123 apples and 456 oranges."

matches = re.findall(pattern, string)
print("Matches found:", matches)

8. Using re.finditer()

The re.finditer() function returns an iterator yielding match objects for all non-overlapping matches of a pattern in a string.

Example

import re

pattern = r"\d+"
string = "There are 123 apples and 456 oranges."

matches = re.finditer(pattern, string)
for match in matches:
    print("Match found:", match.group())

9. Using re.sub()

The re.sub() function replaces all occurrences of a pattern in a string with a replacement.

Example

import re

pattern = r"\d+"
string = "There are 123 apples and 456 oranges."
replacement = "many"

result = re.sub(pattern, replacement, string)
print("Result:", result)

10. Using re.split()

The re.split() function splits a string by the occurrences of a pattern.

Example

import re

pattern = r"\s+"
string = "Split this string by spaces"

result = re.split(pattern, string)
print("Result:", result)

11. Compiling Regular Expressions

You can compile a regular expression pattern into a regex object using the re.compile() function. This can improve performance if the same pattern is used multiple times.

Example

import re

pattern = re.compile(r"\d+")
string = "There are 123 apples and 456 oranges."

matches = pattern.findall(string)
print("Matches found:", matches)

12. Flags in Regular Expressions

Flags can be used to modify the behavior of regular expressions. Some commonly used flags are:

  • re.IGNORECASE or re.I: Ignore case.
  • re.MULTILINE or re.M: Multi-line matching.
  • re.DOTALL or re.S: Make . match any character, including a newline.

Example

import re

pattern = r"hello"
string = "Hello world"

match = re.search(pattern, string, re.IGNORECASE)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

13. Practical Examples

Example 1: Validating an Email Address

import re

def validate_email(email):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    if re.match(pattern, email):
        return True
    else:
        return False

email = "example@example.com"
print("Is valid email:", validate_email(email))

Example 2: Extracting URLs from Text

import re

text = "Visit our website at https://www.example.com or follow us on http://www.socialmedia.com"
pattern = r"https?://[a-zA-Z0-9.-]+"

urls = re.findall(pattern, text)
print("URLs found:", urls)

Example 3: Replacing Multiple Spaces with a Single Space

import re

text = "This   is  a  text with   multiple  spaces."
pattern = r"\s+"

result = re.sub(pattern, " ", text)
print("Result:", result)

14. Conclusion

Regular expressions are used for working with text in Python. By understanding the basic patterns, special characters, and functions provided by the re module, you can perform complex text manipulation tasks efficiently. This tutorial covered the basics of regular expressions, including matching, searching, replacing, splitting, compiling, and using flags. With practical examples, you can see how regular expressions can be applied to real-world problems.

Comments