Python Set Comprehension
Introduction
Set comprehension is a powerful and elegant Python feature that allows you to create sets using a concise syntax. It follows a similar pattern to list comprehensions, but creates a set instead of a list. This feature was introduced in Python 2.7 and provides a more readable and efficient way to generate sets compared to traditional methods.
Set comprehensions are particularly useful when you need to:
- Create a set from an existing iterable
- Filter elements from an iterable
- Transform elements while adding them to a set
- Ensure uniqueness in your resulting collection
In this tutorial, we'll explore how set comprehensions work, their syntax, and practical applications.
Basic Syntax
Here's the basic syntax of a set comprehension:
{expression for item in iterable}
This creates a set where each element is the result of the expression
for each item
in the iterable
.
Let's start with a simple example:
# Creating a set of squares from 0 to 9
squares = {x**2 for x in range(10)}
print(squares)
Output:
{0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
Notice that the output is enclosed in curly braces {}
, indicating it's a set. Also, the order might appear different when you run the code since sets are unordered collections.
Filtering with Conditions
You can add conditions to filter which items are included in the set:
# Creating a set of even squares from 0 to 9
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares)
Output:
{0, 4, 16, 36, 64}
Comparing with Traditional Methods
Let's compare set comprehension with traditional methods:
# Traditional approach using a for loop
squares_traditional = set()
for x in range(10):
squares_traditional.add(x**2)
print(squares_traditional)
# Set comprehension approach
squares_comprehension = {x**2 for x in range(10)}
print(squares_comprehension)
Both approaches yield the same result, but the set comprehension is more concise and often more readable.
Multiple Conditions and Nested Loops
Set comprehensions can include multiple conditions and nested loops:
# Multiple conditions
numbers = {x for x in range(100) if x % 2 == 0 if x % 5 == 0}
print(numbers)
# Nested loops
coordinate_pairs = {(x, y) for x in range(3) for y in range(3)}
print(coordinate_pairs)
Output:
{0, 10, 20, 30, 40, 50, 60, 70, 80, 90}
{(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)}
If-Else in Set Comprehensions
You can use if-else constructs within the expression part of a set comprehension:
# If-else in the expression
parity = {"even" if x % 2 == 0 else "odd" for x in range(5)}
print(parity)
Output:
{'odd', 'even'}
Notice that the set only contains two elements: 'odd' and 'even'. This is because sets only store unique values. If we had used a list comprehension, we would have seen all five values.
Practical Examples
Example 1: Extracting Unique Words from a Text
text = "Python is powerful and Python is also easy to learn"
unique_words = {word.lower() for word in text.split()}
print(unique_words)
Output:
{'powerful', 'to', 'python', 'and', 'also', 'is', 'learn', 'easy'}
Example 2: Converting Case of Characters
mixed_case = "PyThoN"
character_variants = {c.lower() for c in mixed_case} | {c.upper() for c in mixed_case}
print(character_variants)
Output:
{'P', 'Y', 'T', 'H', 'O', 'N', 'p', 'y', 't', 'h', 'o', 'n'}
Example 3: Finding Unique Vowels in a String
sentence = "The quick brown fox jumps over the lazy dog"
vowels = {char for char in sentence.lower() if char in 'aeiou'}
print(vowels)
Output:
{'a', 'e', 'i', 'o', 'u'}
Example 4: Generating a Set of Prime Numbers
def is_prime(n):
if n <= 1:
return False
if n <= 3:
return True
if n % 2 == 0 or n % 3 == 0:
return False
i = 5
while i * i <= n:
if n % i == 0 or n % (i + 2) == 0:
return False
i += 6
return True
primes = {x for x in range(100) if is_prime(x)}
print(primes)
Output:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}
Performance Considerations
Set comprehensions are not just more concise but often more efficient than equivalent loop-based approaches. Here's a simple benchmark:
import time
# Using for loop
start = time.time()
result_loop = set()
for i in range(1000000):
result_loop.add(i * 2)
loop_time = time.time() - start
# Using set comprehension
start = time.time()
result_comp = {i * 2 for i in range(1000000)}
comp_time = time.time() - start
print(f"Loop time: {loop_time:.5f} seconds")
print(f"Comprehension time: {comp_time:.5f} seconds")
print(f"Set comprehension is {loop_time/comp_time:.2f}x faster")
Output (will vary by machine):
Loop time: 0.12345 seconds
Comprehension time: 0.09876 seconds
Set comprehension is 1.25x faster
Common Pitfalls
1. Creating Empty Sets
Remember that empty curly braces {}
create an empty dictionary, not an empty set. To create an empty set, you should use set()
.
empty_dict = {}
empty_set = set()
print(type(empty_dict))
print(type(empty_set))
Output:
<class 'dict'>
<class 'set'>
2. Mutable Elements
Sets can only contain hashable (immutable) elements. This means you cannot have lists or dictionaries as elements in a set.
# This will cause an error
# error_set = {[1, 2], [3, 4]}
# This is fine
valid_set = {(1, 2), (3, 4)}
print(valid_set)
Output:
{(1, 2), (3, 4)}
Summary
Set comprehensions offer a concise and efficient way to create sets in Python. They follow a similar syntax to list comprehensions but produce sets, which are unordered collections of unique elements. Key points to remember:
- Basic syntax:
{expression for item in iterable}
- You can add conditions with
if
statements - Multiple nested loops are supported
- Set comprehensions are often more readable and efficient than traditional loops
- Sets only store unique values, which can be useful or a limitation depending on your needs
By mastering set comprehensions, you'll be able to write more Pythonic code that is both concise and efficient.
Exercises
- Create a set comprehension that generates the set of all perfect squares up to 100.
- Write a set comprehension that extracts all characters that appear in both string A and string B.
- Create a set of all possible combinations of two dice rolls that sum to 7.
- Write a set comprehension that finds all unique file extensions in a list of filenames.
- Create a set of tuples (x, y) where x ranges from 1 to 5, y ranges from 1 to 5, and x ≤ y.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)