.NET HashSets
Introduction
HashSets are one of the most useful collection types in .NET that don't get enough attention from beginners. A HashSet<T>
represents a set of values where each element must be unique. Think of it like a mathematical set - no duplicates are allowed, and the order of elements is not guaranteed.
HashSets are optimized for extremely fast lookups, additions, and removals. While List<T>
might be your go-to collection, there are many scenarios where a HashSet would be a significantly better choice.
HashSet Basics
Creating a HashSet
Let's start by creating a HashSet and performing some basic operations:
using System;
using System.Collections.Generic;
// Creating an empty HashSet of integers
HashSet<int> numbers = new HashSet<int>();
// Creating a HashSet with initial values
HashSet<string> fruits = new HashSet<string>() { "Apple", "Banana", "Orange" };
// Creating from an existing collection
List<char> charList = new List<char>() { 'a', 'b', 'c', 'a', 'b' };
HashSet<char> uniqueChars = new HashSet<char>(charList);
Console.WriteLine($"uniqueChars count: {uniqueChars.Count}"); // Output: uniqueChars count: 3
Notice that even though our charList
contained 5 elements with duplicates, the uniqueChars
HashSet only contains 3 elements because duplicates are automatically excluded.
Adding and Removing Elements
HashSet<string> programmingLanguages = new HashSet<string>();
// Adding elements
programmingLanguages.Add("C#");
programmingLanguages.Add("JavaScript");
programmingLanguages.Add("Python");
// Trying to add a duplicate
bool wasAdded = programmingLanguages.Add("C#");
Console.WriteLine($"Was C# added again? {wasAdded}"); // Output: Was C# added again? False
// Removing elements
bool wasRemoved = programmingLanguages.Remove("JavaScript");
Console.WriteLine($"Was JavaScript removed? {wasRemoved}"); // Output: Was JavaScript removed? True
// Checking the count
Console.WriteLine($"Number of languages: {programmingLanguages.Count}"); // Output: Number of languages: 2
Checking for Elements
One of the main advantages of HashSets is the extremely fast lookup operation. Checking if an element exists in a HashSet is an O(1) operation (constant time), while in a List it would be O(n) (linear time).
HashSet<string> users = new HashSet<string>() { "Alice", "Bob", "Charlie" };
// Checking if an element exists
bool isAlicePresent = users.Contains("Alice");
bool isDavePresent = users.Contains("Dave");
Console.WriteLine($"Is Alice a user? {isAlicePresent}"); // Output: Is Alice a user? True
Console.WriteLine($"Is Dave a user? {isDavePresent}"); // Output: Is Dave a user? False
Set Operations
HashSets really shine when you need to perform mathematical set operations like union, intersection, and difference.
Union, Intersection, and Difference
HashSet<int> set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
HashSet<int> set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };
// Union: combines all elements from both sets (no duplicates)
set1.UnionWith(set2);
Console.WriteLine($"Union: {string.Join(", ", set1)}");
// Output: Union: 1, 2, 3, 4, 5, 6, 7
// Creating sets again for the next operations
set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };
// Intersection: keeps only elements that exist in both sets
set1.IntersectWith(set2);
Console.WriteLine($"Intersection: {string.Join(", ", set1)}");
// Output: Intersection: 3, 4, 5
// Creating sets again for the next operations
set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };
// Difference: removes elements that exist in the second set
set1.ExceptWith(set2);
Console.WriteLine($"Difference (set1 - set2): {string.Join(", ", set1)}");
// Output: Difference (set1 - set2): 1, 2
Checking Relations Between Sets
HashSet provides methods to check relationships between sets:
HashSet<char> vowels = new HashSet<char>() { 'a', 'e', 'i', 'o', 'u' };
HashSet<char> letters1 = new HashSet<char>() { 'a', 'b', 'c' };
HashSet<char> letters2 = new HashSet<char>() { 'a', 'e' };
// IsSubsetOf: checks if all elements in current set are in the other set
bool isSubset = letters2.IsSubsetOf(vowels);
Console.WriteLine($"Is {string.Join(", ", letters2)} a subset of vowels? {isSubset}");
// Output: Is a, e a subset of vowels? True
// IsSupersetOf: checks if current set contains all elements of other set
bool isSuperset = vowels.IsSupersetOf(letters2);
Console.WriteLine($"Is vowels a superset of {string.Join(", ", letters2)}? {isSuperset}");
// Output: Is vowels a superset of a, e? True
// Overlaps: checks if sets share at least one common element
bool hasOverlap = letters1.Overlaps(vowels);
Console.WriteLine($"Do {string.Join(", ", letters1)} and vowels overlap? {hasOverlap}");
// Output: Do a, b, c and vowels overlap? True
// SetEquals: checks if both sets contain exactly the same elements
bool areEqual = letters1.SetEquals(letters2);
Console.WriteLine($"Are the sets equal? {areEqual}");
// Output: Are the sets equal? False
Practical Applications
Finding Unique Elements
One common use case for HashSets is to quickly find unique elements in a collection:
List<string> allUserEmails = new List<string>()
{
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
};
HashSet<string> uniqueEmails = new HashSet<string>(allUserEmails);
Console.WriteLine($"Total emails: {allUserEmails.Count}"); // Output: Total emails: 5
Console.WriteLine($"Unique emails: {uniqueEmails.Count}"); // Output: Unique emails: 3
Checking for Duplicates
HashSets make it easy to check if there are any duplicate elements:
List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };
HashSet<int> uniqueNumbers = new HashSet<int>();
bool hasDuplicates = false;
foreach (var num in numbers)
{
// If Add returns false, it means the element was already in the set
if (!uniqueNumbers.Add(num))
{
hasDuplicates = true;
break;
}
}
Console.WriteLine($"List has duplicates: {hasDuplicates}"); // Output: List has duplicates: False
Implementing a Simple Dictionary Spell Checker
// A simple spell checker using HashSet
HashSet<string> dictionary = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"apple", "banana", "orange", "grape", "kiwi",
"programming", "computer", "algorithm", "data", "structure"
};
string[] wordsToCheck = { "Apple", "bananas", "programming", "datascience" };
foreach (var word in wordsToCheck)
{
bool isCorrect = dictionary.Contains(word);
Console.WriteLine($"'{word}' is {(isCorrect ? "correct" : "misspelled")}");
}
// Output:
// 'Apple' is correct
// 'bananas' is misspelled
// 'programming' is correct
// 'datascience' is misspelled
Finding Common Friends
Let's say you have two users and want to find their common friends:
HashSet<string> aliceFriends = new HashSet<string> { "Bob", "Charlie", "Diana", "Edward" };
HashSet<string> bobFriends = new HashSet<string> { "Alice", "Charlie", "Frank", "Diana" };
HashSet<string> commonFriends = new HashSet<string>(aliceFriends);
commonFriends.IntersectWith(bobFriends);
Console.WriteLine("Common friends:");
foreach (var friend in commonFriends)
{
Console.WriteLine($"- {friend}");
}
// Output:
// Common friends:
// - Charlie
// - Diana
Performance Considerations
HashSets are optimized for extremely fast lookups, additions and removals with average time complexity. This makes them much faster than List<T>
for these operations, especially as the collection grows larger.
Here's a simple comparison between List and HashSet for checking if an element exists:
const int elementCount = 1000000;
List<int> numbersList = new List<int>();
HashSet<int> numbersSet = new HashSet<int>();
// Populate both collections with the same data
for (int i = 0; i < elementCount; i++)
{
numbersList.Add(i);
numbersSet.Add(i);
}
// Test List.Contains performance
var listStartTime = DateTime.Now;
bool containsInList = numbersList.Contains(999999);
var listEndTime = DateTime.Now;
// Test HashSet.Contains performance
var setStartTime = DateTime.Now;
bool containsInSet = numbersSet.Contains(999999);
var setEndTime = DateTime.Now;
Console.WriteLine($"List search time: {(listEndTime - listStartTime).TotalMilliseconds} ms");
Console.WriteLine($"HashSet search time: {(setEndTime - setStartTime).TotalMilliseconds} ms");
// Example output:
// List search time: 4.5783 ms
// HashSet search time: 0.0039 ms
The HashSet is usually more than 1000 times faster for lookups in large collections! However, HashSets do consume more memory than Lists, so there's a trade-off to consider.
Summary
HashSets are powerful collection types in .NET that excel at:
- Storing unique elements with no duplicates
- Providing extremely fast lookups, additions, and removals
- Performing set operations like union, intersection, and difference
- Checking relationships between sets
When to use a HashSet:
- When you need to ensure uniqueness of elements
- When you frequently need to check if an element exists
- When you need to perform set operations
- When the order of elements doesn't matter
HashSets are an essential tool in any .NET developer's toolkit, and understanding when to use them can significantly improve your application's performance.
Exercises
- Create a program that reads a text file and counts the number of unique words.
- Implement a function that finds all common elements in three different arrays.
- Build a simple friends recommendation system: "People you might know" based on friends of friends.
- Create a program that checks if two strings are anagrams of each other using HashSets.
- Implement a custom equality comparer for a HashSet of your own class type.
Additional Resources
HashSet<T>
Class Documentation- Set Operations in C#
IEqualityComparer<T>
Interface for customizing how objects are compared in HashSets
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)