Ubuntu Regular Expressions
Regular expressions (regex) are powerful pattern-matching tools that allow you to search, match, and manipulate text in sophisticated ways. In Ubuntu shell scripting, regular expressions are essential for processing log files, validating input, extracting data, and automating various text manipulation tasks.
Introduction to Regular Expressions
A regular expression is a sequence of characters that defines a search pattern. These patterns can be used to:
- Match text that follows specific patterns
- Find and replace text
- Extract information from files and strings
- Validate input formats (like email addresses, phone numbers, etc.)
In Ubuntu, several command-line tools support regular expressions, including grep
, sed
, awk
, and various programming languages. While the specific implementation may vary slightly between tools, the fundamental concepts remain consistent.
Basic Regular Expression Syntax
Let's start with the basic building blocks of regular expressions:
Literal Characters
Any regular character in a pattern matches itself. For example, the pattern ubuntu
will match the string "ubuntu".
$ echo "I love ubuntu linux" | grep "ubuntu"
I love ubuntu linux
Special Characters and Metacharacters
Regular expressions include special characters (metacharacters) that have specific meanings:
Metacharacter | Description | Example |
---|---|---|
. | Matches any single character except newline | a.c matches "abc", "adc", "a1c", etc. |
^ | Matches the start of a line | ^ubuntu matches "ubuntu" only at the beginning of a line |
$ | Matches the end of a line | ubuntu$ matches "ubuntu" only at the end of a line |
* | Matches zero or more occurrences of the previous character | ab*c matches "ac", "abc", "abbc", etc. |
+ | Matches one or more occurrences of the previous character | ab+c matches "abc", "abbc", but not "ac" |
? | Matches zero or one occurrence of the previous character | ab?c matches "ac" and "abc", but not "abbc" |
\ | Escapes special characters | \. matches a literal period instead of any character |
[] | Character class - matches any character within the brackets | [aeiou] matches any vowel |
[^] | Negated character class - matches any character not within the brackets | [^aeiou] matches any non-vowel |
() | Groups patterns together | (ubuntu) groups the pattern "ubuntu" for backreferences |
| | Alternation - matches either pattern | ubuntu|debian matches either "ubuntu" or "debian" |
Common Regular Expression Examples
Let's explore some practical examples of regular expressions in Ubuntu:
Example 1: Matching IP Addresses
A pattern to match IPv4 addresses:
$ ifconfig | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
inet 192.168.1.10 netmask 255.255.255.0 broadcast 192.168.1.255
inet 127.0.0.1 netmask 255.0.0.0
Example 2: Finding Email Addresses in a File
$ grep -E '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' contacts.txt
[email protected]
[email protected]
Example 3: Validating Phone Numbers
$ echo "My phone number is 123-456-7890" | grep -E '\b[0-9]{3}-[0-9]{3}-[0-9]{4}\b'
My phone number is 123-456-7890
Regular Expressions with Common Ubuntu Tools
grep
The grep
command is one of the most common tools for using regular expressions in Ubuntu:
# Basic grep
$ grep "ubuntu" file.txt
# Using extended regular expressions
$ grep -E "ubuntu|debian" file.txt
# Case-insensitive search
$ grep -i "UBUNTU" file.txt
# Show line numbers
$ grep -n "ubuntu" file.txt
# Recursive search through directories
$ grep -r "ubuntu" /path/to/directory/
sed
The Stream Editor sed
is perfect for search and replace operations:
# Replace first occurrence of 'ubuntu' with 'Ubuntu' on each line
$ sed 's/ubuntu/Ubuntu/' file.txt
# Replace all occurrences of 'ubuntu' with 'Ubuntu'
$ sed 's/ubuntu/Ubuntu/g' file.txt
# Replace 'ubuntu' with 'Ubuntu' only if the line contains 'linux'
$ sed '/linux/s/ubuntu/Ubuntu/g' file.txt
# Delete lines matching a pattern
$ sed '/^#/d' file.txt # Deletes comment lines starting with #
awk
The awk
programming language is excellent for processing text data:
# Print lines where the first field matches 'ubuntu'
$ awk '$1 ~ /ubuntu/' file.txt
# Sum the values in the third column
$ awk '{sum += $3} END {print sum}' file.txt
# Print lines that match a regular expression
$ awk '/ubuntu/' file.txt
# Format output based on patterns
$ awk '/error/ {print "ERROR: " $0}; /warning/ {print "WARNING: " $0}' log.txt
Character Classes and Shorthand Notations
Regular expressions offer shorthand notations for common character classes:
Notation | Description | Equivalent |
---|---|---|
\d | Digit | [0-9] |
\D | Non-digit | [^0-9] |
\w | Word character | [A-Za-z0-9_] |
\W | Non-word character | [^A-Za-z0-9_] |
\s | Whitespace | `[ \t |
\r\f]` | ||
\S | Non-whitespace | `[^ \t |
\r\f]` |
Note: In some tools like grep
, you might need to use -P
(Perl-compatible) flag to use these shorthands, or double escape them (\\d
).
Quantifiers in Regular Expressions
Quantifiers specify how many instances of a character, group, or character class must be present for a match:
Quantifier | Description |
---|---|
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 0 or 1 time |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n,m} | Match between n and m times |
Example:
# Match lines containing exactly 8-digit numbers
$ grep -E '\b[0-9]{8}\b' file.txt
# Match lines with words having 5 to 10 characters
$ grep -E '\b\w{5,10}\b' file.txt
Regular Expressions in Shell Scripts
Let's create a practical shell script that validates user input using regular expressions:
#!/bin/bash
# Function to validate email address
validate_email() {
local email=$1
local regex="^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
if [[ $email =~ $regex ]]; then
echo "Valid email address!"
return 0
else
echo "Invalid email address!"
return 1
fi
}
# Function to validate IP address
validate_ip() {
local ip=$1
local regex="^([0-9]{1,3}\.){3}[0-9]{1,3}$"
if [[ $ip =~ $regex ]]; then
# Further check each octet
IFS='.' read -r -a octets <<< "$ip"
for octet in "${octets[@]}"; do
if (( octet > 255 )); then
echo "Invalid IP address: octet $octet exceeds 255!"
return 1
fi
done
echo "Valid IP address!"
return 0
else
echo "Invalid IP address format!"
return 1
fi
}
# Main script
echo "===== Input Validation with Regular Expressions ====="
echo
# Email validation
read -p "Enter an email address: " email_input
validate_email "$email_input"
echo
# IP validation
read -p "Enter an IP address: " ip_input
validate_ip "$ip_input"
Example Usage and Output:
===== Input Validation with Regular Expressions =====
Enter an email address: [email protected]
Valid email address!
Enter an IP address: 192.168.1.1
Valid IP address!
Advanced Regex Techniques
Lookahead and Lookbehind Assertions
These allow you to match patterns only if they're followed or preceded by another pattern:
# Positive lookahead: Match 'ubuntu' only if followed by 'linux'
$ grep -P 'ubuntu(?=linux)' file.txt
# Negative lookahead: Match 'ubuntu' only if NOT followed by 'linux'
$ grep -P 'ubuntu(?!linux)' file.txt
# Positive lookbehind: Match 'linux' only if preceded by 'ubuntu'
$ grep -P '(?<=ubuntu)linux' file.txt
# Negative lookbehind: Match 'linux' only if NOT preceded by 'ubuntu'
$ grep -P '(?<!ubuntu)linux' file.txt
Note: These require the -P
(Perl-compatible) flag in grep.
Backreferences
Backreferences allow you to match the same text that was matched by a capturing group:
# Match repeated words
$ grep -E '\b(\w+)\s+\1\b' file.txt
# Example output for "The the cat sat on the mat":
The the cat sat on the mat
Real-world Application: Log File Analysis
Here's a practical example of using regular expressions to extract information from log files:
#!/bin/bash
# Script to analyze Apache access log
LOG_FILE="/var/log/apache2/access.log"
# Count total number of requests
total_requests=$(wc -l < "$LOG_FILE")
echo "Total Requests: $total_requests"
# Count 404 errors
not_found=$(grep -c ' 404 ' "$LOG_FILE")
echo "404 Not Found Errors: $not_found"
# Count unique IP addresses
unique_ips=$(grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' "$LOG_FILE" | sort -u | wc -l)
echo "Unique IP Addresses: $unique_ips"
# Find the most requested URL
echo "Most Requested URL:"
grep -oE 'GET [^ ]+' "$LOG_FILE" | sort | uniq -c | sort -nr | head -1
# Extract all requests from a specific IP
read -p "Enter an IP to show its requests: " target_ip
echo "Requests from $target_ip:"
grep "$target_ip" "$LOG_FILE"
# Find all requests made between a specific time range
echo "Requests between 10:00 and 11:00:"
grep -E '([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]' "$LOG_FILE" | grep -E '10:[0-5][0-9]:[0-5][0-9]'
Working with Regular Expressions Interactively
For testing and learning regular expressions, several online tools are available. However, in Ubuntu, you can use the regex-tester
package:
# Install regex-tester
$ sudo apt install regex-tester
# Use regexscan for immediate testing
$ echo "This is an ubuntu system" | regexscan 'ubuntu'
Summary
Regular expressions are invaluable tools for text processing in Ubuntu shell scripting. In this guide, we've covered:
- Basic regex syntax and metacharacters
- Common patterns and examples
- Using regex with popular Ubuntu tools (
grep
,sed
,awk
) - Character classes and quantifiers
- Implementing regex in shell scripts
- Advanced techniques like lookahead/lookbehind and backreferences
- Real-world applications for log file analysis
With practice, you'll be able to craft precise patterns to match exactly what you need, making your text processing tasks much more efficient.
Exercises for Practice
- Write a regular expression to match valid Ubuntu version numbers (e.g., 20.04, 22.10).
- Create a shell script that uses regular expressions to validate passwords based on these rules:
- At least 8 characters
- Contains at least one uppercase letter
- Contains at least one lowercase letter
- Contains at least one number
- Contains at least one special character
- Write a command using
grep
to extract all URLs from an HTML file. - Create a
sed
command to convert dates from MM/DD/YYYY format to YYYY-MM-DD format in a text file. - Write an
awk
script to extract and sum all numbers that appear after the word "Total:" in a log file.
Additional Resources
- The GNU Regex manual:
man 7 regex
- Ubuntu Community Help: Regular Expressions (
help.ubuntu.com
) - Book: "Mastering Regular Expressions" by Jeffrey Friedl
- Online regex testing tools: RegExr and Regex101
man grep
,man sed
, andman awk
for detailed documentation on these tools
Remember that the key to mastering regular expressions is practice. Start with simple patterns and gradually tackle more complex ones as your understanding grows.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)