Skip to main content

Python Collections Module

Introduction

The Python Collections module is a built-in module that implements specialized container data types providing alternatives to Python's general-purpose built-in containers such as lists, dictionaries, and tuples. If you've been working with Python's standard data structures, the Collections module offers more specialized, efficient tools that can make your code more readable and performant.

In this tutorial, we'll explore the most commonly used data structures from the Collections module:

  • Counter: counts occurrences of elements
  • defaultdict: dictionary with default values
  • namedtuple: tuple with named fields
  • deque: double-ended queue
  • OrderedDict: dictionary that remembers insertion order

Let's dive in and see how these specialized data structures can simplify your code and make it more efficient.

Getting Started with Collections

To use any of the data structures from the Collections module, you first need to import them:

python
from collections import Counter, defaultdict, namedtuple, deque, OrderedDict

You can also import the entire module and access the classes using dot notation:

python
import collections

my_counter = collections.Counter()

Counter

What is Counter?

Counter is a dict subclass designed for counting hashable objects. It's essentially a dictionary where elements are stored as keys and their counts as values.

Basic Usage

Let's see how to create and use a Counter:

python
from collections import Counter

# Create a Counter from a list
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
fruit_counter = Counter(fruits)

print(fruit_counter)

Output:

Counter({'apple': 3, 'banana': 2, 'orange': 1})

Common Methods

Counter objects have several useful methods:

python
# Most common elements
print(fruit_counter.most_common(2)) # Top 2 most common

# Total of all counts
print(sum(fruit_counter.values()))

# Convert to a list of elements
print(list(fruit_counter.elements()))

# Update with new elements
fruit_counter.update(['apple', 'mango', 'mango'])
print(fruit_counter)

# Subtract occurrences
fruit_counter.subtract(['apple', 'apple'])
print(fruit_counter)

Output:

[('apple', 3), ('banana', 2)]
6
['apple', 'apple', 'apple', 'banana', 'banana', 'orange']
Counter({'apple': 4, 'banana': 2, 'orange': 1, 'mango': 2})
Counter({'apple': 2, 'banana': 2, 'orange': 1, 'mango': 2})

Practical Example: Word Frequency Analysis

Let's count the most frequent words in a text:

python
from collections import Counter
import re

def count_word_frequency(text):
# Convert to lowercase and split into words
words = re.findall(r'\w+', text.lower())

# Count the words
word_counts = Counter(words)

# Return the 5 most common words
return word_counts.most_common(5)

sample_text = """
Python is an interpreted high-level general-purpose programming language.
Python's design philosophy emphasizes code readability with its notable use of significant indentation.
Python is dynamically-typed and garbage-collected.
"""

print(count_word_frequency(sample_text))

Output:

[('python', 3), ('is', 3), ('high', 1), ('level', 1), ('general', 1)]

defaultdict

What is defaultdict?

defaultdict is a dictionary subclass that calls a factory function to supply missing values. This is extremely useful when working with data that might not have all possible keys defined.

Basic Usage

python
from collections import defaultdict

# Create a defaultdict with int as the default factory
fruit_count = defaultdict(int)

# Add items to the dictionary
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
for fruit in fruits:
fruit_count[fruit] += 1

print(dict(fruit_count)) # Convert to regular dict for display

Output:

{'apple': 3, 'banana': 2, 'orange': 1}

Notice how we didn't need to check if the key existed before incrementing its value. The defaultdict took care of that for us!

Default Factory Functions

You can use any callable (function that returns a value) as a default factory:

python
# defaultdict with list as default factory
grouped_items = defaultdict(list)

# Group items by their first letter
items = ["apple", "banana", "cherry", "avocado", "blueberry"]
for item in items:
grouped_items[item[0]].append(item)

print(dict(grouped_items))

Output:

{'a': ['apple', 'avocado'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}

Practical Example: Grouping Data

Let's see how defaultdict can simplify grouping data by a category:

python
from collections import defaultdict

def group_students_by_grade(student_data):
# Create a defaultdict that will hold lists
graded_students = defaultdict(list)

for name, grade in student_data:
graded_students[grade].append(name)

return dict(graded_students) # Convert to regular dict for return

students = [
("Alice", "A"),
("Bob", "B"),
("Charlie", "A"),
("David", "C"),
("Eva", "B")
]

print(group_students_by_grade(students))

Output:

{'A': ['Alice', 'Charlie'], 'B': ['Bob', 'Eva'], 'C': ['David']}

namedtuple

What is namedtuple?

namedtuple creates tuple subclasses with named fields. They're immutable like regular tuples, but you can access fields by name instead of just by index.

Basic Usage

python
from collections import namedtuple

# Create a namedtuple type called 'Point'
Point = namedtuple('Point', ['x', 'y'])

# Create a Point instance
p = Point(10, 20)

print(p.x, p.y) # Access by name
print(p[0], p[1]) # Access by index
print(p) # Print the tuple

Output:

10 20
10 20
Point(x=10, y=20)

Creating namedtuples

There are multiple ways to define the fields:

python
# Different ways to specify field names
Person1 = namedtuple('Person', ['name', 'age', 'city'])
Person2 = namedtuple('Person', 'name age city') # Space-separated string
Person3 = namedtuple('Person', 'name,age,city') # Comma-separated string

# Creating instances
john = Person1('John', 30, 'New York')
alice = Person2('Alice', 25, 'Boston')

print(john)
print(alice)

Output:

Person(name='John', age=30, city='New York')
Person(name='Alice', age=25, city='Boston')

Practical Example: Representing Data Records

namedtuple is perfect for representing fixed data records:

python
from collections import namedtuple
from datetime import date

# Define a Book type
Book = namedtuple('Book', 'title author isbn publication_date price')

# Create book instances
books = [
Book('Python Crash Course', 'Eric Matthes', '1593279280',
date(2019, 5, 3), 39.99),
Book('Fluent Python', 'Luciano Ramalho', '1491946008',
date(2015, 8, 20), 49.99),
Book('Learning Python', 'Mark Lutz', '1449355730',
date(2013, 7, 6), 59.99)
]

# Using the named fields
for book in books:
print(f"{book.title} by {book.author}, published on {book.publication_date}")

# Find the most expensive book
most_expensive = max(books, key=lambda book: book.price)
print(f"Most expensive book: {most_expensive.title} (${most_expensive.price})")

Output:

Python Crash Course by Eric Matthes, published on 2019-05-03
Fluent Python by Luciano Ramalho, published on 2015-08-20
Learning Python by Mark Lutz, published on 2013-07-06
Most expensive book: Learning Python ($59.99)

deque

What is deque?

deque (pronounced "deck") stands for "double-ended queue" and is designed to have fast appends and pops from both ends. Unlike lists, which are optimized for fast operations at the end, deques are optimized for operations at both the beginning and end.

Basic Usage

python
from collections import deque

# Create a deque
queue = deque(['a', 'b', 'c'])
print(queue)

# Add to right side
queue.append('d')
print(queue)

# Add to left side
queue.appendleft('e')
print(queue)

# Remove from right
right_item = queue.pop()
print(f"Popped from right: {right_item}")
print(queue)

# Remove from left
left_item = queue.popleft()
print(f"Popped from left: {left_item}")
print(queue)

Output:

deque(['a', 'b', 'c'])
deque(['a', 'b', 'c', 'd'])
deque(['e', 'a', 'b', 'c', 'd'])
Popped from right: d
deque(['e', 'a', 'b', 'c'])
Popped from left: e
deque(['a', 'b', 'c'])

Additional Methods

deque provides many other useful methods:

python
# Create a new deque
d = deque([1, 2, 3, 4, 5])

# Rotate the deque (positive is right, negative is left)
d.rotate(1) # Rotate right by 1
print(f"After rotating right: {d}")

d.rotate(-2) # Rotate left by 2
print(f"After rotating left: {d}")

# Add multiple elements at once
d.extend([6, 7, 8]) # Add to right
print(f"After extend: {d}")

d.extendleft([0, -1, -2]) # Add to left (note the order gets reversed)
print(f"After extendleft: {d}")

# Limit the size of the deque
limited_deque = deque(maxlen=3)
for i in range(5):
limited_deque.append(i)
print(f"Added {i}: {limited_deque}")

Output:

After rotating right: deque([5, 1, 2, 3, 4])
After rotating left: deque([2, 3, 4, 5, 1])
After extend: deque([2, 3, 4, 5, 1, 6, 7, 8])
After extendleft: deque([-2, -1, 0, 2, 3, 4, 5, 1, 6, 7, 8])
Added 0: deque([0])
Added 1: deque([0, 1])
Added 2: deque([0, 1, 2])
Added 3: deque([1, 2, 3])
Added 4: deque([2, 3, 4])

Practical Example: Recent Items History

A common use case for deque is maintaining a fixed-size list of "recent" items:

python
from collections import deque

class BrowsingHistory:
def __init__(self, max_history=5):
self.history = deque(maxlen=max_history)

def visit_page(self, page):
self.history.append(page)

def get_history(self):
return list(self.history)

def go_back(self):
if len(self.history) > 1:
# Remove the current page
self.history.pop()
# Return the previous page (now the current one)
return self.history[-1]
else:
return None

# Create a browsing history with max 3 items
browser = BrowsingHistory(max_history=3)

# Visit some pages
pages = ["Home", "Products", "About", "Contact"]
for page in pages:
browser.visit_page(page)
print(f"Visited: {page}, History: {browser.get_history()}")

# Go back in history
previous_page = browser.go_back()
print(f"Went back to: {previous_page}, History: {browser.get_history()}")

Output:

Visited: Home, History: ['Home']
Visited: Products, History: ['Home', 'Products']
Visited: About, History: ['Home', 'Products', 'About']
Visited: Contact, History: ['Products', 'About', 'Contact']
Went back to: About, History: ['Products', 'About']

OrderedDict

What is OrderedDict?

OrderedDict is a dictionary subclass that remembers the order in which keys were inserted. While regular dictionaries in Python 3.7+ also maintain insertion order, OrderedDict provides additional behaviors like the move_to_end method.

Basic Usage

python
from collections import OrderedDict

# Create an OrderedDict
colors = OrderedDict()
colors['red'] = '#FF0000'
colors['green'] = '#00FF00'
colors['blue'] = '#0000FF'

print(colors)

# Regular dictionary also maintains order in Python 3.7+
regular_dict = {}
regular_dict['red'] = '#FF0000'
regular_dict['green'] = '#00FF00'
regular_dict['blue'] = '#0000FF'

print(regular_dict)

Output:

OrderedDict([('red', '#FF0000'), ('green', '#00FF00'), ('blue', '#0000FF')])
{'red': '#FF0000', 'green': '#00FF00', 'blue': '#0000FF'}

Special Methods

OrderedDict provides some methods not available in regular dictionaries:

python
# Create an OrderedDict
colors = OrderedDict([
('red', '#FF0000'),
('green', '#00FF00'),
('blue', '#0000FF')
])

print("Original order:", list(colors.items()))

# Move 'red' to the end
colors.move_to_end('red')
print("After move_to_end('red'):", list(colors.items()))

# Move 'blue' to the beginning (last=False)
colors.move_to_end('blue', last=False)
print("After move_to_end('blue', last=False):", list(colors.items()))

# popitem (default is last=True which removes the last item)
last_item = colors.popitem()
print(f"Removed last item: {last_item}")
print("After popitem():", list(colors.items()))

# popitem with last=False (removes the first item)
first_item = colors.popitem(last=False)
print(f"Removed first item: {first_item}")
print("After popitem(last=False):", list(colors.items()))

Output:

Original order: [('red', '#FF0000'), ('green', '#00FF00'), ('blue', '#0000FF')]
After move_to_end('red'): [('green', '#00FF00'), ('blue', '#0000FF'), ('red', '#FF0000')]
After move_to_end('blue', last=False): [('blue', '#0000FF'), ('green', '#00FF00'), ('red', '#FF0000')]
Removed last item: ('red', '#FF0000')
After popitem(): [('blue', '#0000FF'), ('green', '#00FF00')]
Removed first item: ('blue', '#0000FF')
After popitem(last=False): [('green', '#00FF00')]

Practical Example: LRU Cache

One common use of OrderedDict is implementing a Least Recently Used (LRU) cache:

python
from collections import OrderedDict

class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()

def get(self, key):
if key not in self.cache:
return -1

# Move the accessed key to the end (most recently used)
self.cache.move_to_end(key)
return self.cache[key]

def put(self, key, value):
# If key exists, update it and move to the end
if key in self.cache:
self.cache[key] = value
self.cache.move_to_end(key)
return

# If cache is full, remove the least recently used item
if len(self.cache) >= self.capacity:
self.cache.popitem(last=False)

# Add the new key-value pair
self.cache[key] = value

def display(self):
return dict(self.cache)

# Create an LRU cache with capacity 3
cache = LRUCache(3)

# Add some values
cache.put("A", 1)
cache.put("B", 2)
cache.put("C", 3)
print(f"Cache after initial adds: {cache.display()}")

# Access a value (moves it to "most recently used")
print(f"Value for key 'A': {cache.get('A')}")
print(f"Cache after accessing 'A': {cache.display()}")

# Add a new value that will evict the least recently used (now 'B')
cache.put("D", 4)
print(f"Cache after adding 'D': {cache.display()}")

# Check that 'B' was evicted
print(f"Value for key 'B': {cache.get('B')}")

Output:

Cache after initial adds: {'A': 1, 'B': 2, 'C': 3}
Value for key 'A': 1
Cache after accessing 'A': {'B': 2, 'C': 3, 'A': 1}
Cache after adding 'D': {'C': 3, 'A': 1, 'D': 4}
Value for key 'B': -1

Summary

The Python Collections module provides specialized data structures that extend the functionality of Python's built-in containers:

  • Counter: A dict subclass for counting hashable objects
  • defaultdict: A dict subclass that calls a factory function to supply missing values
  • namedtuple: Factory function for creating tuple subclasses with named fields
  • deque: List-like container with fast appends and pops on both ends
  • OrderedDict: Dict subclass that remembers the order entries were added

These specialized data structures can significantly simplify your code and improve its efficiency in specific use cases. By choosing the right container for your task, you can write cleaner, more intuitive, and more efficient Python code.

Additional Resources

To deepen your understanding of the Collections module:

  1. Python Official Documentation on Collections
  2. Real Python's Guide to the Collections Module

Practice Exercises

  1. Use Counter to find the most common words in a text file.
  2. Implement a multi-level dictionary using defaultdict.
  3. Create a namedtuple to represent a student record with fields for name, ID, and grades.
  4. Use a deque to implement a simple task scheduler that rotates through tasks.
  5. Build a simple "most recently used" tracker using OrderedDict.

Happy coding with Python's Collections module!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)