C# Regular Expressions
Regular expressions (regex or regexp) are powerful tools for pattern matching and text manipulation. In C#, you can use regular expressions to search, match, and manipulate text based on patterns. This guide will take you through the fundamentals of using regular expressions in C# and show you how to apply them to solve real-world problems.
Introduction to Regular Expressions
Regular expressions are sequences of characters that define a search pattern. They are particularly useful when you need to:
- Validate text format (like email addresses or phone numbers)
- Search for specific patterns within text
- Extract information from text
- Replace or modify parts of text based on patterns
C# provides robust support for regular expressions through the System.Text.RegularExpressions
namespace, which contains the Regex
class and related types.
Getting Started with Regular Expressions
First, you need to include the necessary namespace:
using System.Text.RegularExpressions;
Basic Pattern Matching
Let's start with a simple example - checking if a string contains a specific pattern:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Check if a string contains a number
string text = "Hello 123 World";
bool containsNumber = Regex.IsMatch(text, @"\d+");
Console.WriteLine($"Contains number: {containsNumber}");
// Output: Contains number: True
}
}
In this example, \d+
is a regular expression pattern that matches one or more digits.
Regular Expression Syntax Basics
Let's explore some fundamental regex patterns:
Character Classes
Pattern | Description | Example |
---|---|---|
\d | Matches any digit | \d+ matches "123" in "abc123" |
\w | Matches any word character (alphanumeric + underscore) | \w+ matches "Hello_123" in "Hello_123!" |
\s | Matches any whitespace character | \s+ matches spaces in "Hello World" |
. | Matches any character except newline | a.b matches "axb" in "axbyz" |
Quantifiers
Pattern | Description | Example |
---|---|---|
* | 0 or more occurrences | a* matches "", "a", "aa", etc. |
+ | 1 or more occurrences | a+ matches "a", "aa", etc. but not "" |
? | 0 or 1 occurrence | a? matches "" or "a" |
{n} | Exactly n occurrences | a{3} matches "aaa" |
{n,} | At least n occurrences | a{2,} matches "aa", "aaa", etc. |
{n,m} | Between n and m occurrences | a{2,4} matches "aa", "aaa", or "aaaa" |
Common Regex Methods in C#
C# offers several methods for working with regular expressions:
1. Regex.IsMatch()
Determines if a regex pattern matches within a string:
bool isValidEmail = Regex.IsMatch(email, @"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$");
Console.WriteLine($"Is valid email: {isValidEmail}");
2. Regex.Match()
and Regex.Matches()
Finds the first match or all matches for a pattern:
string text = "Contact us at [email protected] or [email protected]";
Match match = Regex.Match(text, @"[\w-]+@[\w-]+\.[\w-]+");
if (match.Success)
{
Console.WriteLine($"Found email: {match.Value}");
// Output: Found email: [email protected]
}
// Find all matches
MatchCollection matches = Regex.Matches(text, @"[\w-]+@[\w-]+\.[\w-]+");
Console.WriteLine($"Found {matches.Count} email addresses:");
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
// Output:
// Found 2 email addresses:
// [email protected]
// [email protected]
3. Regex.Replace()
Replaces text that matches a pattern:
string phoneNumber = "Call me at 123-456-7890 today";
string formatted = Regex.Replace(phoneNumber, @"(\d{3})-(\d{3})-(\d{4})", "($1) $2-$3");
Console.WriteLine(formatted);
// Output: Call me at (123) 456-7890 today
Working with Groups and Captures
Groups allow you to extract specific parts of a match:
string htmlTag = "<a href='https://example.com'>Visit Example</a>";
Match tagMatch = Regex.Match(htmlTag, @"<a href='([^']*)'>(.*?)</a>");
if (tagMatch.Success)
{
string url = tagMatch.Groups[1].Value;
string linkText = tagMatch.Groups[2].Value;
Console.WriteLine($"URL: {url}");
Console.WriteLine($"Link text: {linkText}");
// Output:
// URL: https://example.com
// Link text: Visit Example
}
Regex Options
You can modify regex behavior using options:
// Case-insensitive matching
bool caseInsensitiveMatch = Regex.IsMatch("Hello World", "hello", RegexOptions.IgnoreCase);
Console.WriteLine($"Case-insensitive match: {caseInsensitiveMatch}");
// Output: Case-insensitive match: True
// Multiline mode - ^ and $ match beginning/end of each line
string multilineText = "Line 1\nLine 2\nLine 3";
MatchCollection lineMatches = Regex.Matches(multilineText, @"^Line \d$", RegexOptions.Multiline);
Console.WriteLine($"Found {lineMatches.Count} matching lines");
// Output: Found 3 matching lines
Practical Examples
1. Validating Email Addresses
public static bool IsValidEmail(string email)
{
// Simple email validation pattern
string pattern = @"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$";
return Regex.IsMatch(email, pattern);
}
// Usage
string[] emails = {
"[email protected]",
"invalid.email",
"[email protected]"
};
foreach (var email in emails)
{
Console.WriteLine($"{email}: {IsValidEmail(email)}");
}
// Output:
// [email protected]: True
// invalid.email: False
// [email protected]: True (might be True or False depending on the exact pattern)
2. Parsing CSV Data
string csvLine = "John,Doe,\"New York, NY\",25,Developer";
// Regex that handles quoted fields containing commas
Regex csvParser = new Regex(@"(?:^|,)(?:""([^""]*)""|([^,]*))", RegexOptions.Compiled);
List<string> fields = new List<string>();
foreach (Match match in csvParser.Matches(csvLine))
{
Group quotedField = match.Groups[1];
Group unquotedField = match.Groups[2];
string value = quotedField.Success
? quotedField.Value
: unquotedField.Success ? unquotedField.Value : string.Empty;
fields.Add(value);
}
Console.WriteLine("Parsed CSV fields:");
foreach (var field in fields)
{
Console.WriteLine($"- {field}");
}
// Output:
// Parsed CSV fields:
// - John
// - Doe
// - New York, NY
// - 25
// - Developer
3. Extracting Data from Logs
string logLine = "[2023-05-15 10:15:32] ERROR: Database connection failed (Timeout=30s)";
Match logMatch = Regex.Match(logLine, @"\[(.*?)\]\s+(\w+):\s+(.*?)(?:\s+\((.*)\))?$");
if (logMatch.Success)
{
string timestamp = logMatch.Groups[1].Value;
string logLevel = logMatch.Groups[2].Value;
string message = logMatch.Groups[3].Value;
string details = logMatch.Groups[4].Success ? logMatch.Groups[4].Value : "No details";
Console.WriteLine($"Timestamp: {timestamp}");
Console.WriteLine($"Level: {logLevel}");
Console.WriteLine($"Message: {message}");
Console.WriteLine($"Details: {details}");
}
// Output:
// Timestamp: 2023-05-15 10:15:32
// Level: ERROR
// Message: Database connection failed
// Details: Timeout=30s
Performance Considerations
While regular expressions are powerful, there are some performance considerations:
-
Compile patterns when reusing them:
csharp// Create a compiled regex object for repeated use
Regex emailPattern = new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$", RegexOptions.Compiled);
// Then use it multiple times
bool isValid1 = emailPattern.IsMatch("[email protected]");
bool isValid2 = emailPattern.IsMatch("[email protected]"); -
Avoid catastrophic backtracking with complex patterns on large inputs
-
Use static methods for one-time use and compiled Regex objects for repeated use
Common Pitfalls and Solutions
1. Escaping Special Characters
Special characters like .
, *
, ?
, +
, (
, )
, [
, ]
, {
, }
, ^
, $
, |
, and \
need to be escaped with a backslash:
// Matching a literal period/dot
string input = "example.com";
bool hasDot = Regex.IsMatch(input, @"\."); // Use \. to match a literal dot
Console.WriteLine($"Has dot: {hasDot}"); // Output: Has dot: True
2. Using Verbatim String Literals
Since regex patterns often contain backslashes, it's recommended to use verbatim strings (@"..."
) to avoid double-escaping:
// Without verbatim string - requires double backslashes
string pattern1 = "\\d+";
// With verbatim string - more readable
string pattern2 = @"\d+";
Summary
Regular expressions are a powerful tool in C# for pattern matching and text manipulation. They enable you to:
- Validate input formats
- Extract specific parts of text
- Transform text based on patterns
- Parse complex text formats
While they have a learning curve, regular expressions can greatly simplify text processing tasks and reduce the amount of code you need to write.
Exercises
- Create a regex pattern to validate a North American phone number (formats like: 555-123-4567, (555) 123-4567, 5551234567)
- Write a function that extracts all URLs from a given text
- Create a pattern to validate a password with these requirements: at least 8 characters, containing at least one uppercase letter, one lowercase letter, one number, and one special character
- Write a function that converts a date from "MM/DD/YYYY" format to "YYYY-MM-DD" format using regex
- Create a regex to extract text between HTML tags
Additional Resources
- Microsoft Regex Documentation
- Regex101 - An interactive tool for testing and debugging regex patterns
- RegExr - Another helpful tool for learning, building, and testing regex
- Regular Expressions Cookbook - For deeper dives into complex regex solutions
Remember that while regular expressions are powerful, they can also become complex and difficult to maintain. For very complex text processing needs, consider whether a dedicated parsing library might be more appropriate.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)