Skip to main content

.NET HashSets

Introduction

HashSets are one of the most useful collection types in .NET that don't get enough attention from beginners. A HashSet<T> represents a set of values where each element must be unique. Think of it like a mathematical set - no duplicates are allowed, and the order of elements is not guaranteed.

HashSets are optimized for extremely fast lookups, additions, and removals. While List<T> might be your go-to collection, there are many scenarios where a HashSet would be a significantly better choice.

HashSet Basics

Creating a HashSet

Let's start by creating a HashSet and performing some basic operations:

csharp
using System;
using System.Collections.Generic;

// Creating an empty HashSet of integers
HashSet<int> numbers = new HashSet<int>();

// Creating a HashSet with initial values
HashSet<string> fruits = new HashSet<string>() { "Apple", "Banana", "Orange" };

// Creating from an existing collection
List<char> charList = new List<char>() { 'a', 'b', 'c', 'a', 'b' };
HashSet<char> uniqueChars = new HashSet<char>(charList);

Console.WriteLine($"uniqueChars count: {uniqueChars.Count}"); // Output: uniqueChars count: 3

Notice that even though our charList contained 5 elements with duplicates, the uniqueChars HashSet only contains 3 elements because duplicates are automatically excluded.

Adding and Removing Elements

csharp
HashSet<string> programmingLanguages = new HashSet<string>();

// Adding elements
programmingLanguages.Add("C#");
programmingLanguages.Add("JavaScript");
programmingLanguages.Add("Python");

// Trying to add a duplicate
bool wasAdded = programmingLanguages.Add("C#");
Console.WriteLine($"Was C# added again? {wasAdded}"); // Output: Was C# added again? False

// Removing elements
bool wasRemoved = programmingLanguages.Remove("JavaScript");
Console.WriteLine($"Was JavaScript removed? {wasRemoved}"); // Output: Was JavaScript removed? True

// Checking the count
Console.WriteLine($"Number of languages: {programmingLanguages.Count}"); // Output: Number of languages: 2

Checking for Elements

One of the main advantages of HashSets is the extremely fast lookup operation. Checking if an element exists in a HashSet is an O(1) operation (constant time), while in a List it would be O(n) (linear time).

csharp
HashSet<string> users = new HashSet<string>() { "Alice", "Bob", "Charlie" };

// Checking if an element exists
bool isAlicePresent = users.Contains("Alice");
bool isDavePresent = users.Contains("Dave");

Console.WriteLine($"Is Alice a user? {isAlicePresent}"); // Output: Is Alice a user? True
Console.WriteLine($"Is Dave a user? {isDavePresent}"); // Output: Is Dave a user? False

Set Operations

HashSets really shine when you need to perform mathematical set operations like union, intersection, and difference.

Union, Intersection, and Difference

csharp
HashSet<int> set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
HashSet<int> set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };

// Union: combines all elements from both sets (no duplicates)
set1.UnionWith(set2);
Console.WriteLine($"Union: {string.Join(", ", set1)}");
// Output: Union: 1, 2, 3, 4, 5, 6, 7

// Creating sets again for the next operations
set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };

// Intersection: keeps only elements that exist in both sets
set1.IntersectWith(set2);
Console.WriteLine($"Intersection: {string.Join(", ", set1)}");
// Output: Intersection: 3, 4, 5

// Creating sets again for the next operations
set1 = new HashSet<int>() { 1, 2, 3, 4, 5 };
set2 = new HashSet<int>() { 3, 4, 5, 6, 7 };

// Difference: removes elements that exist in the second set
set1.ExceptWith(set2);
Console.WriteLine($"Difference (set1 - set2): {string.Join(", ", set1)}");
// Output: Difference (set1 - set2): 1, 2

Checking Relations Between Sets

HashSet provides methods to check relationships between sets:

csharp
HashSet<char> vowels = new HashSet<char>() { 'a', 'e', 'i', 'o', 'u' };
HashSet<char> letters1 = new HashSet<char>() { 'a', 'b', 'c' };
HashSet<char> letters2 = new HashSet<char>() { 'a', 'e' };

// IsSubsetOf: checks if all elements in current set are in the other set
bool isSubset = letters2.IsSubsetOf(vowels);
Console.WriteLine($"Is {string.Join(", ", letters2)} a subset of vowels? {isSubset}");
// Output: Is a, e a subset of vowels? True

// IsSupersetOf: checks if current set contains all elements of other set
bool isSuperset = vowels.IsSupersetOf(letters2);
Console.WriteLine($"Is vowels a superset of {string.Join(", ", letters2)}? {isSuperset}");
// Output: Is vowels a superset of a, e? True

// Overlaps: checks if sets share at least one common element
bool hasOverlap = letters1.Overlaps(vowels);
Console.WriteLine($"Do {string.Join(", ", letters1)} and vowels overlap? {hasOverlap}");
// Output: Do a, b, c and vowels overlap? True

// SetEquals: checks if both sets contain exactly the same elements
bool areEqual = letters1.SetEquals(letters2);
Console.WriteLine($"Are the sets equal? {areEqual}");
// Output: Are the sets equal? False

Practical Applications

Finding Unique Elements

One common use case for HashSets is to quickly find unique elements in a collection:

csharp
List<string> allUserEmails = new List<string>() 
{
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
};

HashSet<string> uniqueEmails = new HashSet<string>(allUserEmails);

Console.WriteLine($"Total emails: {allUserEmails.Count}"); // Output: Total emails: 5
Console.WriteLine($"Unique emails: {uniqueEmails.Count}"); // Output: Unique emails: 3

Checking for Duplicates

HashSets make it easy to check if there are any duplicate elements:

csharp
List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };
HashSet<int> uniqueNumbers = new HashSet<int>();

bool hasDuplicates = false;
foreach (var num in numbers)
{
// If Add returns false, it means the element was already in the set
if (!uniqueNumbers.Add(num))
{
hasDuplicates = true;
break;
}
}

Console.WriteLine($"List has duplicates: {hasDuplicates}"); // Output: List has duplicates: False

Implementing a Simple Dictionary Spell Checker

csharp
// A simple spell checker using HashSet
HashSet<string> dictionary = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"apple", "banana", "orange", "grape", "kiwi",
"programming", "computer", "algorithm", "data", "structure"
};

string[] wordsToCheck = { "Apple", "bananas", "programming", "datascience" };

foreach (var word in wordsToCheck)
{
bool isCorrect = dictionary.Contains(word);
Console.WriteLine($"'{word}' is {(isCorrect ? "correct" : "misspelled")}");
}

// Output:
// 'Apple' is correct
// 'bananas' is misspelled
// 'programming' is correct
// 'datascience' is misspelled

Finding Common Friends

Let's say you have two users and want to find their common friends:

csharp
HashSet<string> aliceFriends = new HashSet<string> { "Bob", "Charlie", "Diana", "Edward" };
HashSet<string> bobFriends = new HashSet<string> { "Alice", "Charlie", "Frank", "Diana" };

HashSet<string> commonFriends = new HashSet<string>(aliceFriends);
commonFriends.IntersectWith(bobFriends);

Console.WriteLine("Common friends:");
foreach (var friend in commonFriends)
{
Console.WriteLine($"- {friend}");
}

// Output:
// Common friends:
// - Charlie
// - Diana

Performance Considerations

HashSets are optimized for extremely fast lookups, additions and removals with average O(1)O(1) time complexity. This makes them much faster than List<T> for these operations, especially as the collection grows larger.

Here's a simple comparison between List and HashSet for checking if an element exists:

csharp
const int elementCount = 1000000;
List<int> numbersList = new List<int>();
HashSet<int> numbersSet = new HashSet<int>();

// Populate both collections with the same data
for (int i = 0; i < elementCount; i++)
{
numbersList.Add(i);
numbersSet.Add(i);
}

// Test List.Contains performance
var listStartTime = DateTime.Now;
bool containsInList = numbersList.Contains(999999);
var listEndTime = DateTime.Now;

// Test HashSet.Contains performance
var setStartTime = DateTime.Now;
bool containsInSet = numbersSet.Contains(999999);
var setEndTime = DateTime.Now;

Console.WriteLine($"List search time: {(listEndTime - listStartTime).TotalMilliseconds} ms");
Console.WriteLine($"HashSet search time: {(setEndTime - setStartTime).TotalMilliseconds} ms");

// Example output:
// List search time: 4.5783 ms
// HashSet search time: 0.0039 ms

The HashSet is usually more than 1000 times faster for lookups in large collections! However, HashSets do consume more memory than Lists, so there's a trade-off to consider.

Summary

HashSets are powerful collection types in .NET that excel at:

  • Storing unique elements with no duplicates
  • Providing extremely fast lookups, additions, and removals
  • Performing set operations like union, intersection, and difference
  • Checking relationships between sets

When to use a HashSet:

  • When you need to ensure uniqueness of elements
  • When you frequently need to check if an element exists
  • When you need to perform set operations
  • When the order of elements doesn't matter

HashSets are an essential tool in any .NET developer's toolkit, and understanding when to use them can significantly improve your application's performance.

Exercises

  1. Create a program that reads a text file and counts the number of unique words.
  2. Implement a function that finds all common elements in three different arrays.
  3. Build a simple friends recommendation system: "People you might know" based on friends of friends.
  4. Create a program that checks if two strings are anagrams of each other using HashSets.
  5. Implement a custom equality comparer for a HashSet of your own class type.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)