Python Sets
Introduction
Sets are one of Python's built-in data structures that represent an unordered collection of unique elements. If you're familiar with mathematical set theory, Python sets work in much the same way, supporting operations like union, intersection, and difference.
Sets are particularly useful when you need to:
- Store a collection of items where duplicates are not allowed
- Check if an item exists in a collection quickly
- Remove duplicates from a sequence
- Perform mathematical set operations
Unlike lists or tuples, sets are unordered (items don't have a specific position) and mutable (you can add or remove items), but the elements themselves must be immutable (like strings, numbers, or tuples containing immutable elements).
Creating Sets
Basic Creation
You can create a set using curly braces {}
or the set()
constructor:
# Using curly braces
fruits = {'apple', 'banana', 'cherry'}
print(fruits)
# Using the set() constructor
colors = set(['red', 'green', 'blue'])
print(colors)
Output:
{'cherry', 'banana', 'apple'}
{'blue', 'green', 'red'}
Notice that the order of elements might be different from how they were defined. This is because sets are unordered.
Creating an Empty Set
To create an empty set, you must use the set()
constructor. Using empty curly braces {}
creates an empty dictionary instead:
# Correct way to create an empty set
empty_set = set()
print(type(empty_set))
# This creates an empty dictionary, not a set!
not_a_set = {}
print(type(not_a_set))
Output:
<class 'set'>
<class 'dict'>
Set with Mixed Data Types
Sets can contain different types of immutable elements:
mixed_set = {42, 'python', (1, 2, 3), True}
print(mixed_set)
Output:
{42, 'python', True, (1, 2, 3)}
Duplicates Are Automatically Removed
One of the key features of sets is that they automatically eliminate duplicates:
# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 4, 4, 5}
print(numbers)
Output:
{1, 2, 3, 4, 5}
Basic Set Operations
Adding Elements
You can add elements to a set using the add()
method or update the set with multiple elements using the update()
method:
# Adding a single element
fruits = {'apple', 'banana', 'cherry'}
fruits.add('orange')
print(fruits)
# Adding multiple elements
fruits.update(['mango', 'grapes'])
print(fruits)
# You can also update with another set, list, tuple, etc.
fruits.update({'pineapple', 'kiwi'}, ['watermelon'])
print(fruits)
Output:
{'cherry', 'apple', 'banana', 'orange'}
{'mango', 'cherry', 'apple', 'banana', 'grapes', 'orange'}
{'watermelon', 'cherry', 'kiwi', 'apple', 'banana', 'pineapple', 'mango', 'grapes', 'orange'}
Removing Elements
There are several methods to remove elements:
fruits = {'apple', 'banana', 'cherry', 'orange', 'kiwi'}
# remove() - raises KeyError if element doesn't exist
fruits.remove('banana')
print(fruits)
# discard() - doesn't raise error if element doesn't exist
fruits.discard('mango') # No error even though 'mango' isn't in the set
print(fruits)
# pop() - removes and returns an arbitrary element
# Since sets are unordered, you can't control which element gets removed
item = fruits.pop()
print(f"Popped item: {item}")
print(fruits)
# clear() - removes all elements
fruits.clear()
print(fruits)
Output:
{'cherry', 'apple', 'orange', 'kiwi'}
{'cherry', 'apple', 'orange', 'kiwi'}
Popped item: cherry
{'apple', 'orange', 'kiwi'}
set()
Set Methods and Operations
Sets support mathematical operations that correspond to set theory operations.
Membership Testing
Checking if an item exists in a set is very efficient:
fruits = {'apple', 'banana', 'cherry'}
print('banana' in fruits) # True
print('mango' in fruits) # False
Output:
True
False
Common Set Methods
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
# union() - returns a new set with elements from both sets
union_set = set_a.union(set_b)
print(f"Union: {union_set}")
# intersection() - returns a new set with elements common to both sets
intersection_set = set_a.intersection(set_b)
print(f"Intersection: {intersection_set}")
# difference() - returns a new set with elements in set_a but not in set_b
difference_set = set_a.difference(set_b)
print(f"Difference (set_a - set_b): {difference_set}")
# symmetric_difference() - returns a new set with elements in either set but not in both
symmetric_difference_set = set_a.symmetric_difference(set_b)
print(f"Symmetric Difference: {symmetric_difference_set}")
Output:
Union: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection: {4, 5}
Difference (set_a - set_b): {1, 2, 3}
Symmetric Difference: {1, 2, 3, 6, 7, 8}
Set Operators
Python provides operators that correspond to mathematical set operations:
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
# Union using |
print(f"Union using |: {set_a | set_b}")
# Intersection using &
print(f"Intersection using &: {set_a & set_b}")
# Difference using -
print(f"Difference using -: {set_a - set_b}")
# Symmetric difference using ^
print(f"Symmetric difference using ^: {set_a ^ set_b}")
Output:
Union using |: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection using &: {4, 5}
Difference using -: {1, 2, 3}
Symmetric difference using ^: {1, 2, 3, 6, 7, 8}
Update Methods
These methods modify the original set:
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
# update() - adds elements from another set (similar to |=)
set_a_copy = set_a.copy()
set_a_copy.update(set_b)
print(f"After update(): {set_a_copy}")
# intersection_update() - keeps only elements found in both sets (similar to &=)
set_a_copy = set_a.copy()
set_a_copy.intersection_update(set_b)
print(f"After intersection_update(): {set_a_copy}")
# difference_update() - removes elements found in another set (similar to -=)
set_a_copy = set_a.copy()
set_a_copy.difference_update(set_b)
print(f"After difference_update(): {set_a_copy}")
# symmetric_difference_update() - keeps elements in either set but not in both (similar to ^=)
set_a_copy = set_a.copy()
set_a_copy.symmetric_difference_update(set_b)
print(f"After symmetric_difference_update(): {set_a_copy}")
Output:
After update(): {1, 2, 3, 4, 5, 6, 7, 8}
After intersection_update(): {4, 5}
After difference_update(): {1, 2, 3}
After symmetric_difference_update(): {1, 2, 3, 6, 7, 8}
Set Comparison Methods
Sets can be compared to check their relationships:
set1 = {1, 2, 3}
set2 = {1, 2, 3, 4, 5}
set3 = {6, 7}
# issubset() - returns True if a set is a subset of another
print(f"{set1} is subset of {set2}? {set1.issubset(set2)}")
# issuperset() - returns True if a set contains another set
print(f"{set2} is superset of {set1}? {set2.issuperset(set1)}")
# isdisjoint() - returns True if sets have no elements in common
print(f"{set1} is disjoint with {set3}? {set1.isdisjoint(set3)}")
print(f"{set1} is disjoint with {set2}? {set1.isdisjoint(set2)}")
Output:
{1, 2, 3} is subset of {1, 2, 3, 4, 5}? True
{1, 2, 3, 4, 5} is superset of {1, 2, 3}? True
{1, 2, 3} is disjoint with {6, 7}? True
{1, 2, 3} is disjoint with {1, 2, 3, 4, 5}? False
Practical Applications of Sets
Removing Duplicates from a List
One of the most common uses of sets is to remove duplicates from a list:
# Original list with duplicates
numbers = [1, 2, 2, 3, 4, 4, 5, 5, 5]
# Convert to a set to remove duplicates, then back to a list
unique_numbers = list(set(numbers))
print(unique_numbers)
# Note: This method doesn't preserve the original order
# If order matters, use a different approach
Output:
[1, 2, 3, 4, 5]
Finding Unique Elements
Sets make it easy to identify unique elements across collections:
list1 = ['apple', 'banana', 'cherry', 'date']
list2 = ['banana', 'date', 'elderberry', 'fig']
# Find items that are in either list (union)
all_fruits = set(list1).union(set(list2))
print(f"All fruits: {all_fruits}")
# Find items that appear in both lists (intersection)
common_fruits = set(list1).intersection(set(list2))
print(f"Common fruits: {common_fruits}")
# Find items that are unique to list1 (difference)
unique_to_list1 = set(list1).difference(set(list2))
print(f"Unique to list1: {unique_to_list1}")
# Find items that are unique to list2 (difference)
unique_to_list2 = set(list2).difference(set(list1))
print(f"Unique to list2: {unique_to_list2}")
# Find items that are in one list but not both (symmetric difference)
in_one_list_only = set(list1).symmetric_difference(set(list2))
print(f"In one list only: {in_one_list_only}")
Output:
All fruits: {'date', 'apple', 'elderberry', 'banana', 'fig', 'cherry'}
Common fruits: {'date', 'banana'}
Unique to list1: {'apple', 'cherry'}
Unique to list2: {'fig', 'elderberry'}
In one list only: {'apple', 'elderberry', 'fig', 'cherry'}
Checking for Subsets
Sets are useful for determining if one collection is entirely contained within another:
required_skills = {'Python', 'SQL', 'Git'}
candidate_a_skills = {'Python', 'JavaScript', 'HTML', 'SQL', 'Git', 'CSS'}
candidate_b_skills = {'Python', 'JavaScript', 'HTML'}
# Check if a candidate has all required skills
candidate_a_qualifies = required_skills.issubset(candidate_a_skills)
candidate_b_qualifies = required_skills.issubset(candidate_b_skills)
print(f"Candidate A has all required skills: {candidate_a_qualifies}")
print(f"Candidate B has all required skills: {candidate_b_qualifies}")
# What skills is candidate B missing?
missing_skills = required_skills - set(candidate_b_skills)
print(f"Candidate B is missing: {missing_skills}")
Output:
Candidate A has all required skills: True
Candidate B has all required skills: False
Candidate B is missing: {'SQL', 'Git'}
Frequency Analysis
Sets can help identify unique elements for frequency analysis:
text = "Mississippi is a river and a state in the United States"
words = text.lower().split()
# Find unique words
unique_words = set(words)
print(f"Total words: {len(words)}")
print(f"Unique words: {len(unique_words)}")
print(f"Unique word set: {unique_words}")
# Count frequency of each word
word_freq = {}
for word in unique_words:
word_freq[word] = words.count(word)
print("\nWord frequencies:")
for word, count in word_freq.items():
print(f"'{word}': {count}")
Output:
Total words: 11
Unique words: 10
Unique word set: {'and', 'united', 'is', 'states', 'the', 'river', 'in', 'a', 'state', 'mississippi'}
Word frequencies:
'and': 1
'united': 1
'is': 1
'states': 1
'the': 1
'river': 1
'in': 1
'a': 2
'state': 1
'mississippi': 1
Set Comprehensions
Similar to list comprehensions, Python supports set comprehensions which provide a concise way to create sets:
# Creating a set of squares of numbers from 0 to 9
squares = {x**2 for x in range(10)}
print(squares)
# Creating a set of even numbers from a list
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = {x for x in numbers if x % 2 == 0}
print(even_numbers)
# Converting all words to uppercase in a sentence
sentence = "The quick brown fox jumps over the lazy dog"
unique_words = {word.upper() for word in sentence.split()}
print(unique_words)
Output:
{0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
{2, 4, 6, 8, 10}
{'THE', 'QUICK', 'BROWN', 'FOX', 'JUMPS', 'OVER', 'LAZY', 'DOG'}
Frozen Sets
Python also provides a variant of sets called "frozenset" which is immutable (cannot be changed after creation):
# Creating a frozen set
frozen = frozenset([1, 2, 3, 4])
print(frozen)
# Trying to modify a frozen set will cause an error
try:
frozen.add(5) # This will cause an AttributeError
except AttributeError as e:
print(f"Error: {e}")
# Frozen sets can be used as dictionary keys or elements of another set
# Regular sets cannot be used this way
s = {frozenset([1, 2]), frozenset([3, 4])}
print(s)
Output:
frozenset({1, 2, 3, 4})
Error: 'frozenset' object has no attribute 'add'
{frozenset({1, 2}), frozenset({3, 4})}
Performance Considerations
Sets offer very efficient membership testing with O(1) average time complexity. This makes them ideal for checking if an item exists in a collection:
import time
# Comparing membership testing between list and set
large_list = list(range(1000000))
large_set = set(large_list)
search_item = 999999
# Test with list
start = time.time()
item_in_list = search_item in large_list
list_time = time.time() - start
# Test with set
start = time.time()
item_in_set = search_item in large_set
set_time = time.time() - start
print(f"Time to search in list: {list_time:.6f} seconds")
print(f"Time to search in set: {set_time:.6f} seconds")
print(f"Set is approximately {list_time/set_time:.0f} times faster")
Output (values may vary):
Time to search in list: 0.032541 seconds
Time to search in set: 0.000001 seconds
Set is approximately 32541 times faster
Summary
Python sets are powerful data structures that offer unique features:
- Unordered collections of unique, immutable elements
- Efficient membership testing with O(1) average time complexity
- Mathematical set operations like union, intersection, and difference
- Automatic duplicate elimination when converting from other data structures
- Mutable by default, with an immutable variant called
frozenset
Key use cases for sets include:
- Removing duplicates from sequences
- Fast membership testing
- Finding common or unique elements across collections
- Set operations like union, intersection, and difference
- Efficient frequency analysis and filtering
Practice Exercises
-
Write a function that takes two lists and returns a list of elements that are common to both lists, without using sets. Then write the same function using sets and compare their performance.
-
Given a list of student names from multiple classes, find the students who are taking all classes.
-
Implement a spell checker that checks if words from input text are present in a dictionary (a large set of valid words).
-
Use sets to solve the classic "Two Sum" problem: given an array of numbers and a target sum, determine if any two numbers in the array add up to the target.
-
Create a function that finds all anagrams in a list of words using sets.
Additional Resources
- Python Documentation on Sets
- Python Set Operations Tutorial
- Mathematical Set Operations in Python
- Time Complexity Analysis of Python Data Structures
Now that you understand Python sets and their operations, you're equipped to solve many problems more efficiently and elegantly than with other data structures!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)