Skip to main content

Python Sets

Introduction

Sets are one of Python's built-in data structures that represent an unordered collection of unique elements. If you're familiar with mathematical set theory, Python sets work in much the same way, supporting operations like union, intersection, and difference.

Sets are particularly useful when you need to:

  • Store a collection of items where duplicates are not allowed
  • Check if an item exists in a collection quickly
  • Remove duplicates from a sequence
  • Perform mathematical set operations

Unlike lists or tuples, sets are unordered (items don't have a specific position) and mutable (you can add or remove items), but the elements themselves must be immutable (like strings, numbers, or tuples containing immutable elements).

Creating Sets

Basic Creation

You can create a set using curly braces {} or the set() constructor:

python
# Using curly braces
fruits = {'apple', 'banana', 'cherry'}
print(fruits)

# Using the set() constructor
colors = set(['red', 'green', 'blue'])
print(colors)

Output:

{'cherry', 'banana', 'apple'}
{'blue', 'green', 'red'}

Notice that the order of elements might be different from how they were defined. This is because sets are unordered.

Creating an Empty Set

To create an empty set, you must use the set() constructor. Using empty curly braces {} creates an empty dictionary instead:

python
# Correct way to create an empty set
empty_set = set()
print(type(empty_set))

# This creates an empty dictionary, not a set!
not_a_set = {}
print(type(not_a_set))

Output:

<class 'set'>
<class 'dict'>

Set with Mixed Data Types

Sets can contain different types of immutable elements:

python
mixed_set = {42, 'python', (1, 2, 3), True}
print(mixed_set)

Output:

{42, 'python', True, (1, 2, 3)}

Duplicates Are Automatically Removed

One of the key features of sets is that they automatically eliminate duplicates:

python
# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 4, 4, 5}
print(numbers)

Output:

{1, 2, 3, 4, 5}

Basic Set Operations

Adding Elements

You can add elements to a set using the add() method or update the set with multiple elements using the update() method:

python
# Adding a single element
fruits = {'apple', 'banana', 'cherry'}
fruits.add('orange')
print(fruits)

# Adding multiple elements
fruits.update(['mango', 'grapes'])
print(fruits)

# You can also update with another set, list, tuple, etc.
fruits.update({'pineapple', 'kiwi'}, ['watermelon'])
print(fruits)

Output:

{'cherry', 'apple', 'banana', 'orange'}
{'mango', 'cherry', 'apple', 'banana', 'grapes', 'orange'}
{'watermelon', 'cherry', 'kiwi', 'apple', 'banana', 'pineapple', 'mango', 'grapes', 'orange'}

Removing Elements

There are several methods to remove elements:

python
fruits = {'apple', 'banana', 'cherry', 'orange', 'kiwi'}

# remove() - raises KeyError if element doesn't exist
fruits.remove('banana')
print(fruits)

# discard() - doesn't raise error if element doesn't exist
fruits.discard('mango') # No error even though 'mango' isn't in the set
print(fruits)

# pop() - removes and returns an arbitrary element
# Since sets are unordered, you can't control which element gets removed
item = fruits.pop()
print(f"Popped item: {item}")
print(fruits)

# clear() - removes all elements
fruits.clear()
print(fruits)

Output:

{'cherry', 'apple', 'orange', 'kiwi'}
{'cherry', 'apple', 'orange', 'kiwi'}
Popped item: cherry
{'apple', 'orange', 'kiwi'}
set()

Set Methods and Operations

Sets support mathematical operations that correspond to set theory operations.

Membership Testing

Checking if an item exists in a set is very efficient:

python
fruits = {'apple', 'banana', 'cherry'}

print('banana' in fruits) # True
print('mango' in fruits) # False

Output:

True
False

Common Set Methods

python
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}

# union() - returns a new set with elements from both sets
union_set = set_a.union(set_b)
print(f"Union: {union_set}")

# intersection() - returns a new set with elements common to both sets
intersection_set = set_a.intersection(set_b)
print(f"Intersection: {intersection_set}")

# difference() - returns a new set with elements in set_a but not in set_b
difference_set = set_a.difference(set_b)
print(f"Difference (set_a - set_b): {difference_set}")

# symmetric_difference() - returns a new set with elements in either set but not in both
symmetric_difference_set = set_a.symmetric_difference(set_b)
print(f"Symmetric Difference: {symmetric_difference_set}")

Output:

Union: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection: {4, 5}
Difference (set_a - set_b): {1, 2, 3}
Symmetric Difference: {1, 2, 3, 6, 7, 8}

Set Operators

Python provides operators that correspond to mathematical set operations:

python
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}

# Union using |
print(f"Union using |: {set_a | set_b}")

# Intersection using &
print(f"Intersection using &: {set_a & set_b}")

# Difference using -
print(f"Difference using -: {set_a - set_b}")

# Symmetric difference using ^
print(f"Symmetric difference using ^: {set_a ^ set_b}")

Output:

Union using |: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection using &: {4, 5}
Difference using -: {1, 2, 3}
Symmetric difference using ^: {1, 2, 3, 6, 7, 8}

Update Methods

These methods modify the original set:

python
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}

# update() - adds elements from another set (similar to |=)
set_a_copy = set_a.copy()
set_a_copy.update(set_b)
print(f"After update(): {set_a_copy}")

# intersection_update() - keeps only elements found in both sets (similar to &=)
set_a_copy = set_a.copy()
set_a_copy.intersection_update(set_b)
print(f"After intersection_update(): {set_a_copy}")

# difference_update() - removes elements found in another set (similar to -=)
set_a_copy = set_a.copy()
set_a_copy.difference_update(set_b)
print(f"After difference_update(): {set_a_copy}")

# symmetric_difference_update() - keeps elements in either set but not in both (similar to ^=)
set_a_copy = set_a.copy()
set_a_copy.symmetric_difference_update(set_b)
print(f"After symmetric_difference_update(): {set_a_copy}")

Output:

After update(): {1, 2, 3, 4, 5, 6, 7, 8}
After intersection_update(): {4, 5}
After difference_update(): {1, 2, 3}
After symmetric_difference_update(): {1, 2, 3, 6, 7, 8}

Set Comparison Methods

Sets can be compared to check their relationships:

python
set1 = {1, 2, 3}
set2 = {1, 2, 3, 4, 5}
set3 = {6, 7}

# issubset() - returns True if a set is a subset of another
print(f"{set1} is subset of {set2}? {set1.issubset(set2)}")

# issuperset() - returns True if a set contains another set
print(f"{set2} is superset of {set1}? {set2.issuperset(set1)}")

# isdisjoint() - returns True if sets have no elements in common
print(f"{set1} is disjoint with {set3}? {set1.isdisjoint(set3)}")
print(f"{set1} is disjoint with {set2}? {set1.isdisjoint(set2)}")

Output:

{1, 2, 3} is subset of {1, 2, 3, 4, 5}? True
{1, 2, 3, 4, 5} is superset of {1, 2, 3}? True
{1, 2, 3} is disjoint with {6, 7}? True
{1, 2, 3} is disjoint with {1, 2, 3, 4, 5}? False

Practical Applications of Sets

Removing Duplicates from a List

One of the most common uses of sets is to remove duplicates from a list:

python
# Original list with duplicates
numbers = [1, 2, 2, 3, 4, 4, 5, 5, 5]

# Convert to a set to remove duplicates, then back to a list
unique_numbers = list(set(numbers))
print(unique_numbers)

# Note: This method doesn't preserve the original order
# If order matters, use a different approach

Output:

[1, 2, 3, 4, 5]

Finding Unique Elements

Sets make it easy to identify unique elements across collections:

python
list1 = ['apple', 'banana', 'cherry', 'date']
list2 = ['banana', 'date', 'elderberry', 'fig']

# Find items that are in either list (union)
all_fruits = set(list1).union(set(list2))
print(f"All fruits: {all_fruits}")

# Find items that appear in both lists (intersection)
common_fruits = set(list1).intersection(set(list2))
print(f"Common fruits: {common_fruits}")

# Find items that are unique to list1 (difference)
unique_to_list1 = set(list1).difference(set(list2))
print(f"Unique to list1: {unique_to_list1}")

# Find items that are unique to list2 (difference)
unique_to_list2 = set(list2).difference(set(list1))
print(f"Unique to list2: {unique_to_list2}")

# Find items that are in one list but not both (symmetric difference)
in_one_list_only = set(list1).symmetric_difference(set(list2))
print(f"In one list only: {in_one_list_only}")

Output:

All fruits: {'date', 'apple', 'elderberry', 'banana', 'fig', 'cherry'}
Common fruits: {'date', 'banana'}
Unique to list1: {'apple', 'cherry'}
Unique to list2: {'fig', 'elderberry'}
In one list only: {'apple', 'elderberry', 'fig', 'cherry'}

Checking for Subsets

Sets are useful for determining if one collection is entirely contained within another:

python
required_skills = {'Python', 'SQL', 'Git'}
candidate_a_skills = {'Python', 'JavaScript', 'HTML', 'SQL', 'Git', 'CSS'}
candidate_b_skills = {'Python', 'JavaScript', 'HTML'}

# Check if a candidate has all required skills
candidate_a_qualifies = required_skills.issubset(candidate_a_skills)
candidate_b_qualifies = required_skills.issubset(candidate_b_skills)

print(f"Candidate A has all required skills: {candidate_a_qualifies}")
print(f"Candidate B has all required skills: {candidate_b_qualifies}")

# What skills is candidate B missing?
missing_skills = required_skills - set(candidate_b_skills)
print(f"Candidate B is missing: {missing_skills}")

Output:

Candidate A has all required skills: True
Candidate B has all required skills: False
Candidate B is missing: {'SQL', 'Git'}

Frequency Analysis

Sets can help identify unique elements for frequency analysis:

python
text = "Mississippi is a river and a state in the United States"
words = text.lower().split()

# Find unique words
unique_words = set(words)
print(f"Total words: {len(words)}")
print(f"Unique words: {len(unique_words)}")
print(f"Unique word set: {unique_words}")

# Count frequency of each word
word_freq = {}
for word in unique_words:
word_freq[word] = words.count(word)

print("\nWord frequencies:")
for word, count in word_freq.items():
print(f"'{word}': {count}")

Output:

Total words: 11
Unique words: 10
Unique word set: {'and', 'united', 'is', 'states', 'the', 'river', 'in', 'a', 'state', 'mississippi'}

Word frequencies:
'and': 1
'united': 1
'is': 1
'states': 1
'the': 1
'river': 1
'in': 1
'a': 2
'state': 1
'mississippi': 1

Set Comprehensions

Similar to list comprehensions, Python supports set comprehensions which provide a concise way to create sets:

python
# Creating a set of squares of numbers from 0 to 9
squares = {x**2 for x in range(10)}
print(squares)

# Creating a set of even numbers from a list
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = {x for x in numbers if x % 2 == 0}
print(even_numbers)

# Converting all words to uppercase in a sentence
sentence = "The quick brown fox jumps over the lazy dog"
unique_words = {word.upper() for word in sentence.split()}
print(unique_words)

Output:

{0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
{2, 4, 6, 8, 10}
{'THE', 'QUICK', 'BROWN', 'FOX', 'JUMPS', 'OVER', 'LAZY', 'DOG'}

Frozen Sets

Python also provides a variant of sets called "frozenset" which is immutable (cannot be changed after creation):

python
# Creating a frozen set
frozen = frozenset([1, 2, 3, 4])
print(frozen)

# Trying to modify a frozen set will cause an error
try:
frozen.add(5) # This will cause an AttributeError
except AttributeError as e:
print(f"Error: {e}")

# Frozen sets can be used as dictionary keys or elements of another set
# Regular sets cannot be used this way
s = {frozenset([1, 2]), frozenset([3, 4])}
print(s)

Output:

frozenset({1, 2, 3, 4})
Error: 'frozenset' object has no attribute 'add'
{frozenset({1, 2}), frozenset({3, 4})}

Performance Considerations

Sets offer very efficient membership testing with O(1) average time complexity. This makes them ideal for checking if an item exists in a collection:

python
import time

# Comparing membership testing between list and set
large_list = list(range(1000000))
large_set = set(large_list)
search_item = 999999

# Test with list
start = time.time()
item_in_list = search_item in large_list
list_time = time.time() - start

# Test with set
start = time.time()
item_in_set = search_item in large_set
set_time = time.time() - start

print(f"Time to search in list: {list_time:.6f} seconds")
print(f"Time to search in set: {set_time:.6f} seconds")
print(f"Set is approximately {list_time/set_time:.0f} times faster")

Output (values may vary):

Time to search in list: 0.032541 seconds
Time to search in set: 0.000001 seconds
Set is approximately 32541 times faster

Summary

Python sets are powerful data structures that offer unique features:

  • Unordered collections of unique, immutable elements
  • Efficient membership testing with O(1) average time complexity
  • Mathematical set operations like union, intersection, and difference
  • Automatic duplicate elimination when converting from other data structures
  • Mutable by default, with an immutable variant called frozenset

Key use cases for sets include:

  • Removing duplicates from sequences
  • Fast membership testing
  • Finding common or unique elements across collections
  • Set operations like union, intersection, and difference
  • Efficient frequency analysis and filtering

Practice Exercises

  1. Write a function that takes two lists and returns a list of elements that are common to both lists, without using sets. Then write the same function using sets and compare their performance.

  2. Given a list of student names from multiple classes, find the students who are taking all classes.

  3. Implement a spell checker that checks if words from input text are present in a dictionary (a large set of valid words).

  4. Use sets to solve the classic "Two Sum" problem: given an array of numbers and a target sum, determine if any two numbers in the array add up to the target.

  5. Create a function that finds all anagrams in a list of words using sets.

Additional Resources

Now that you understand Python sets and their operations, you're equipped to solve many problems more efficiently and elegantly than with other data structures!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)