Skip to main content

Ubuntu Regular Expressions

Regular expressions (regex) are powerful pattern-matching tools that allow you to search, match, and manipulate text in sophisticated ways. In Ubuntu shell scripting, regular expressions are essential for processing log files, validating input, extracting data, and automating various text manipulation tasks.

Introduction to Regular Expressions

A regular expression is a sequence of characters that defines a search pattern. These patterns can be used to:

  • Match text that follows specific patterns
  • Find and replace text
  • Extract information from files and strings
  • Validate input formats (like email addresses, phone numbers, etc.)

In Ubuntu, several command-line tools support regular expressions, including grep, sed, awk, and various programming languages. While the specific implementation may vary slightly between tools, the fundamental concepts remain consistent.

Basic Regular Expression Syntax

Let's start with the basic building blocks of regular expressions:

Literal Characters

Any regular character in a pattern matches itself. For example, the pattern ubuntu will match the string "ubuntu".

bash
$ echo "I love ubuntu linux" | grep "ubuntu"
I love ubuntu linux

Special Characters and Metacharacters

Regular expressions include special characters (metacharacters) that have specific meanings:

MetacharacterDescriptionExample
.Matches any single character except newlinea.c matches "abc", "adc", "a1c", etc.
^Matches the start of a line^ubuntu matches "ubuntu" only at the beginning of a line
$Matches the end of a lineubuntu$ matches "ubuntu" only at the end of a line
*Matches zero or more occurrences of the previous characterab*c matches "ac", "abc", "abbc", etc.
+Matches one or more occurrences of the previous characterab+c matches "abc", "abbc", but not "ac"
?Matches zero or one occurrence of the previous characterab?c matches "ac" and "abc", but not "abbc"
\Escapes special characters\. matches a literal period instead of any character
[]Character class - matches any character within the brackets[aeiou] matches any vowel
[^]Negated character class - matches any character not within the brackets[^aeiou] matches any non-vowel
()Groups patterns together(ubuntu) groups the pattern "ubuntu" for backreferences
|Alternation - matches either patternubuntu|debian matches either "ubuntu" or "debian"

Common Regular Expression Examples

Let's explore some practical examples of regular expressions in Ubuntu:

Example 1: Matching IP Addresses

A pattern to match IPv4 addresses:

bash
$ ifconfig | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
inet 192.168.1.10 netmask 255.255.255.0 broadcast 192.168.1.255
inet 127.0.0.1 netmask 255.0.0.0

Example 2: Finding Email Addresses in a File

bash
$ grep -E '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' contacts.txt
[email protected]
[email protected]

Example 3: Validating Phone Numbers

bash
$ echo "My phone number is 123-456-7890" | grep -E '\b[0-9]{3}-[0-9]{3}-[0-9]{4}\b'
My phone number is 123-456-7890

Regular Expressions with Common Ubuntu Tools

grep

The grep command is one of the most common tools for using regular expressions in Ubuntu:

bash
# Basic grep
$ grep "ubuntu" file.txt

# Using extended regular expressions
$ grep -E "ubuntu|debian" file.txt

# Case-insensitive search
$ grep -i "UBUNTU" file.txt

# Show line numbers
$ grep -n "ubuntu" file.txt

# Recursive search through directories
$ grep -r "ubuntu" /path/to/directory/

sed

The Stream Editor sed is perfect for search and replace operations:

bash
# Replace first occurrence of 'ubuntu' with 'Ubuntu' on each line
$ sed 's/ubuntu/Ubuntu/' file.txt

# Replace all occurrences of 'ubuntu' with 'Ubuntu'
$ sed 's/ubuntu/Ubuntu/g' file.txt

# Replace 'ubuntu' with 'Ubuntu' only if the line contains 'linux'
$ sed '/linux/s/ubuntu/Ubuntu/g' file.txt

# Delete lines matching a pattern
$ sed '/^#/d' file.txt # Deletes comment lines starting with #

awk

The awk programming language is excellent for processing text data:

bash
# Print lines where the first field matches 'ubuntu'
$ awk '$1 ~ /ubuntu/' file.txt

# Sum the values in the third column
$ awk '{sum += $3} END {print sum}' file.txt

# Print lines that match a regular expression
$ awk '/ubuntu/' file.txt

# Format output based on patterns
$ awk '/error/ {print "ERROR: " $0}; /warning/ {print "WARNING: " $0}' log.txt

Character Classes and Shorthand Notations

Regular expressions offer shorthand notations for common character classes:

NotationDescriptionEquivalent
\dDigit[0-9]
\DNon-digit[^0-9]
\wWord character[A-Za-z0-9_]
\WNon-word character[^A-Za-z0-9_]
\sWhitespace`[ \t
\r\f]`
\SNon-whitespace`[^ \t
\r\f]`

Note: In some tools like grep, you might need to use -P (Perl-compatible) flag to use these shorthands, or double escape them (\\d).

Quantifiers in Regular Expressions

Quantifiers specify how many instances of a character, group, or character class must be present for a match:

QuantifierDescription
*Match 0 or more times
+Match 1 or more times
?Match 0 or 1 time
{n}Match exactly n times
{n,}Match at least n times
{n,m}Match between n and m times

Example:

bash
# Match lines containing exactly 8-digit numbers
$ grep -E '\b[0-9]{8}\b' file.txt

# Match lines with words having 5 to 10 characters
$ grep -E '\b\w{5,10}\b' file.txt

Regular Expressions in Shell Scripts

Let's create a practical shell script that validates user input using regular expressions:

bash
#!/bin/bash

# Function to validate email address
validate_email() {
local email=$1
local regex="^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"

if [[ $email =~ $regex ]]; then
echo "Valid email address!"
return 0
else
echo "Invalid email address!"
return 1
fi
}

# Function to validate IP address
validate_ip() {
local ip=$1
local regex="^([0-9]{1,3}\.){3}[0-9]{1,3}$"

if [[ $ip =~ $regex ]]; then
# Further check each octet
IFS='.' read -r -a octets <<< "$ip"
for octet in "${octets[@]}"; do
if (( octet > 255 )); then
echo "Invalid IP address: octet $octet exceeds 255!"
return 1
fi
done
echo "Valid IP address!"
return 0
else
echo "Invalid IP address format!"
return 1
fi
}

# Main script
echo "===== Input Validation with Regular Expressions ====="
echo

# Email validation
read -p "Enter an email address: " email_input
validate_email "$email_input"
echo

# IP validation
read -p "Enter an IP address: " ip_input
validate_ip "$ip_input"

Example Usage and Output:

===== Input Validation with Regular Expressions =====

Enter an email address: [email protected]
Valid email address!

Enter an IP address: 192.168.1.1
Valid IP address!

Advanced Regex Techniques

Lookahead and Lookbehind Assertions

These allow you to match patterns only if they're followed or preceded by another pattern:

bash
# Positive lookahead: Match 'ubuntu' only if followed by 'linux'
$ grep -P 'ubuntu(?=linux)' file.txt

# Negative lookahead: Match 'ubuntu' only if NOT followed by 'linux'
$ grep -P 'ubuntu(?!linux)' file.txt

# Positive lookbehind: Match 'linux' only if preceded by 'ubuntu'
$ grep -P '(?<=ubuntu)linux' file.txt

# Negative lookbehind: Match 'linux' only if NOT preceded by 'ubuntu'
$ grep -P '(?<!ubuntu)linux' file.txt

Note: These require the -P (Perl-compatible) flag in grep.

Backreferences

Backreferences allow you to match the same text that was matched by a capturing group:

bash
# Match repeated words
$ grep -E '\b(\w+)\s+\1\b' file.txt

# Example output for "The the cat sat on the mat":
The the cat sat on the mat

Real-world Application: Log File Analysis

Here's a practical example of using regular expressions to extract information from log files:

bash
#!/bin/bash

# Script to analyze Apache access log

LOG_FILE="/var/log/apache2/access.log"

# Count total number of requests
total_requests=$(wc -l < "$LOG_FILE")
echo "Total Requests: $total_requests"

# Count 404 errors
not_found=$(grep -c ' 404 ' "$LOG_FILE")
echo "404 Not Found Errors: $not_found"

# Count unique IP addresses
unique_ips=$(grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$LOG_FILE" | sort -u | wc -l)
echo "Unique IP Addresses: $unique_ips"

# Find the most requested URL
echo "Most Requested URL:"
grep -oE 'GET [^ ]+' "$LOG_FILE" | sort | uniq -c | sort -nr | head -1

# Extract all requests from a specific IP
read -p "Enter an IP to show its requests: " target_ip
echo "Requests from $target_ip:"
grep "$target_ip" "$LOG_FILE"

# Find all requests made between a specific time range
echo "Requests between 10:00 and 11:00:"
grep -E '([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]' "$LOG_FILE" | grep -E '10:[0-5][0-9]:[0-5][0-9]'

Working with Regular Expressions Interactively

For testing and learning regular expressions, several online tools are available. However, in Ubuntu, you can use the regex-tester package:

bash
# Install regex-tester
$ sudo apt install regex-tester

# Use regexscan for immediate testing
$ echo "This is an ubuntu system" | regexscan 'ubuntu'

Summary

Regular expressions are invaluable tools for text processing in Ubuntu shell scripting. In this guide, we've covered:

  • Basic regex syntax and metacharacters
  • Common patterns and examples
  • Using regex with popular Ubuntu tools (grep, sed, awk)
  • Character classes and quantifiers
  • Implementing regex in shell scripts
  • Advanced techniques like lookahead/lookbehind and backreferences
  • Real-world applications for log file analysis

With practice, you'll be able to craft precise patterns to match exactly what you need, making your text processing tasks much more efficient.

Exercises for Practice

  1. Write a regular expression to match valid Ubuntu version numbers (e.g., 20.04, 22.10).
  2. Create a shell script that uses regular expressions to validate passwords based on these rules:
    • At least 8 characters
    • Contains at least one uppercase letter
    • Contains at least one lowercase letter
    • Contains at least one number
    • Contains at least one special character
  3. Write a command using grep to extract all URLs from an HTML file.
  4. Create a sed command to convert dates from MM/DD/YYYY format to YYYY-MM-DD format in a text file.
  5. Write an awk script to extract and sum all numbers that appear after the word "Total:" in a log file.

Additional Resources

  • The GNU Regex manual: man 7 regex
  • Ubuntu Community Help: Regular Expressions (help.ubuntu.com)
  • Book: "Mastering Regular Expressions" by Jeffrey Friedl
  • Online regex testing tools: RegExr and Regex101
  • man grep, man sed, and man awk for detailed documentation on these tools

Remember that the key to mastering regular expressions is practice. Start with simple patterns and gradually tackle more complex ones as your understanding grows.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)