Skip to main content

C# Regular Expressions

Regular expressions (regex or regexp) are powerful tools for pattern matching and text manipulation. In C#, you can use regular expressions to search, match, and manipulate text based on patterns. This guide will take you through the fundamentals of using regular expressions in C# and show you how to apply them to solve real-world problems.

Introduction to Regular Expressions

Regular expressions are sequences of characters that define a search pattern. They are particularly useful when you need to:

  • Validate text format (like email addresses or phone numbers)
  • Search for specific patterns within text
  • Extract information from text
  • Replace or modify parts of text based on patterns

C# provides robust support for regular expressions through the System.Text.RegularExpressions namespace, which contains the Regex class and related types.

Getting Started with Regular Expressions

First, you need to include the necessary namespace:

csharp
using System.Text.RegularExpressions;

Basic Pattern Matching

Let's start with a simple example - checking if a string contains a specific pattern:

csharp
using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
// Check if a string contains a number
string text = "Hello 123 World";
bool containsNumber = Regex.IsMatch(text, @"\d+");

Console.WriteLine($"Contains number: {containsNumber}");
// Output: Contains number: True
}
}

In this example, \d+ is a regular expression pattern that matches one or more digits.

Regular Expression Syntax Basics

Let's explore some fundamental regex patterns:

Character Classes

PatternDescriptionExample
\dMatches any digit\d+ matches "123" in "abc123"
\wMatches any word character (alphanumeric + underscore)\w+ matches "Hello_123" in "Hello_123!"
\sMatches any whitespace character\s+ matches spaces in "Hello World"
.Matches any character except newlinea.b matches "axb" in "axbyz"

Quantifiers

PatternDescriptionExample
*0 or more occurrencesa* matches "", "a", "aa", etc.
+1 or more occurrencesa+ matches "a", "aa", etc. but not ""
?0 or 1 occurrencea? matches "" or "a"
{n}Exactly n occurrencesa{3} matches "aaa"
{n,}At least n occurrencesa{2,} matches "aa", "aaa", etc.
{n,m}Between n and m occurrencesa{2,4} matches "aa", "aaa", or "aaaa"

Common Regex Methods in C#

C# offers several methods for working with regular expressions:

1. Regex.IsMatch()

Determines if a regex pattern matches within a string:

csharp
bool isValidEmail = Regex.IsMatch(email, @"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$");
Console.WriteLine($"Is valid email: {isValidEmail}");

2. Regex.Match() and Regex.Matches()

Finds the first match or all matches for a pattern:

csharp
string text = "Contact us at [email protected] or [email protected]";
Match match = Regex.Match(text, @"[\w-]+@[\w-]+\.[\w-]+");

if (match.Success)
{
Console.WriteLine($"Found email: {match.Value}");
// Output: Found email: [email protected]
}

// Find all matches
MatchCollection matches = Regex.Matches(text, @"[\w-]+@[\w-]+\.[\w-]+");

Console.WriteLine($"Found {matches.Count} email addresses:");
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
// Output:
// Found 2 email addresses:
// [email protected]
// [email protected]

3. Regex.Replace()

Replaces text that matches a pattern:

csharp
string phoneNumber = "Call me at 123-456-7890 today";
string formatted = Regex.Replace(phoneNumber, @"(\d{3})-(\d{3})-(\d{4})", "($1) $2-$3");
Console.WriteLine(formatted);
// Output: Call me at (123) 456-7890 today

Working with Groups and Captures

Groups allow you to extract specific parts of a match:

csharp
string htmlTag = "<a href='https://example.com'>Visit Example</a>";
Match tagMatch = Regex.Match(htmlTag, @"<a href='([^']*)'>(.*?)</a>");

if (tagMatch.Success)
{
string url = tagMatch.Groups[1].Value;
string linkText = tagMatch.Groups[2].Value;

Console.WriteLine($"URL: {url}");
Console.WriteLine($"Link text: {linkText}");

// Output:
// URL: https://example.com
// Link text: Visit Example
}

Regex Options

You can modify regex behavior using options:

csharp
// Case-insensitive matching
bool caseInsensitiveMatch = Regex.IsMatch("Hello World", "hello", RegexOptions.IgnoreCase);
Console.WriteLine($"Case-insensitive match: {caseInsensitiveMatch}");
// Output: Case-insensitive match: True

// Multiline mode - ^ and $ match beginning/end of each line
string multilineText = "Line 1\nLine 2\nLine 3";
MatchCollection lineMatches = Regex.Matches(multilineText, @"^Line \d$", RegexOptions.Multiline);
Console.WriteLine($"Found {lineMatches.Count} matching lines");
// Output: Found 3 matching lines

Practical Examples

1. Validating Email Addresses

csharp
public static bool IsValidEmail(string email)
{
// Simple email validation pattern
string pattern = @"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$";
return Regex.IsMatch(email, pattern);
}

// Usage
string[] emails = {
"[email protected]",
"invalid.email",
"[email protected]"
};

foreach (var email in emails)
{
Console.WriteLine($"{email}: {IsValidEmail(email)}");
}
// Output:
// [email protected]: True
// invalid.email: False
// [email protected]: True (might be True or False depending on the exact pattern)

2. Parsing CSV Data

csharp
string csvLine = "John,Doe,\"New York, NY\",25,Developer";
// Regex that handles quoted fields containing commas
Regex csvParser = new Regex(@"(?:^|,)(?:""([^""]*)""|([^,]*))", RegexOptions.Compiled);

List<string> fields = new List<string>();
foreach (Match match in csvParser.Matches(csvLine))
{
Group quotedField = match.Groups[1];
Group unquotedField = match.Groups[2];

string value = quotedField.Success
? quotedField.Value
: unquotedField.Success ? unquotedField.Value : string.Empty;

fields.Add(value);
}

Console.WriteLine("Parsed CSV fields:");
foreach (var field in fields)
{
Console.WriteLine($"- {field}");
}
// Output:
// Parsed CSV fields:
// - John
// - Doe
// - New York, NY
// - 25
// - Developer

3. Extracting Data from Logs

csharp
string logLine = "[2023-05-15 10:15:32] ERROR: Database connection failed (Timeout=30s)";
Match logMatch = Regex.Match(logLine, @"\[(.*?)\]\s+(\w+):\s+(.*?)(?:\s+\((.*)\))?$");

if (logMatch.Success)
{
string timestamp = logMatch.Groups[1].Value;
string logLevel = logMatch.Groups[2].Value;
string message = logMatch.Groups[3].Value;
string details = logMatch.Groups[4].Success ? logMatch.Groups[4].Value : "No details";

Console.WriteLine($"Timestamp: {timestamp}");
Console.WriteLine($"Level: {logLevel}");
Console.WriteLine($"Message: {message}");
Console.WriteLine($"Details: {details}");
}
// Output:
// Timestamp: 2023-05-15 10:15:32
// Level: ERROR
// Message: Database connection failed
// Details: Timeout=30s

Performance Considerations

While regular expressions are powerful, there are some performance considerations:

  1. Compile patterns when reusing them:

    csharp
    // Create a compiled regex object for repeated use
    Regex emailPattern = new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$", RegexOptions.Compiled);

    // Then use it multiple times
    bool isValid1 = emailPattern.IsMatch("[email protected]");
    bool isValid2 = emailPattern.IsMatch("[email protected]");
  2. Avoid catastrophic backtracking with complex patterns on large inputs

  3. Use static methods for one-time use and compiled Regex objects for repeated use

Common Pitfalls and Solutions

1. Escaping Special Characters

Special characters like ., *, ?, +, (, ), [, ], {, }, ^, $, |, and \ need to be escaped with a backslash:

csharp
// Matching a literal period/dot
string input = "example.com";
bool hasDot = Regex.IsMatch(input, @"\."); // Use \. to match a literal dot
Console.WriteLine($"Has dot: {hasDot}"); // Output: Has dot: True

2. Using Verbatim String Literals

Since regex patterns often contain backslashes, it's recommended to use verbatim strings (@"...") to avoid double-escaping:

csharp
// Without verbatim string - requires double backslashes
string pattern1 = "\\d+";

// With verbatim string - more readable
string pattern2 = @"\d+";

Summary

Regular expressions are a powerful tool in C# for pattern matching and text manipulation. They enable you to:

  • Validate input formats
  • Extract specific parts of text
  • Transform text based on patterns
  • Parse complex text formats

While they have a learning curve, regular expressions can greatly simplify text processing tasks and reduce the amount of code you need to write.

Exercises

  1. Create a regex pattern to validate a North American phone number (formats like: 555-123-4567, (555) 123-4567, 5551234567)
  2. Write a function that extracts all URLs from a given text
  3. Create a pattern to validate a password with these requirements: at least 8 characters, containing at least one uppercase letter, one lowercase letter, one number, and one special character
  4. Write a function that converts a date from "MM/DD/YYYY" format to "YYYY-MM-DD" format using regex
  5. Create a regex to extract text between HTML tags

Additional Resources

Remember that while regular expressions are powerful, they can also become complex and difficult to maintain. For very complex text processing needs, consider whether a dedicated parsing library might be more appropriate.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)