Skip to main content

Python Yield Statement

Introduction

When working with large datasets or sequences in Python, memory efficiency becomes crucial. The yield statement offers an elegant solution by enabling you to create generators - special iterators that generate values on-the-fly instead of storing them all in memory at once.

In this tutorial, you'll learn how the yield statement works, how it differs from regular return statements, and how to leverage generators to write more memory-efficient and cleaner code.

What is the Yield Statement?

The yield statement is used within a function to turn it into a generator function. Unlike a regular function that returns a value and terminates, a generator function:

  • Returns a generator object
  • Pauses execution when it reaches a yield statement
  • Saves its state (local variables, position in code)
  • Resumes from where it left off when called again

This "pause and resume" behavior makes generators perfect for working with large sequences or infinite streams of data.

Basic Syntax of Yield

Here's the basic syntax of a generator function using yield:

python
def generator_function():
# Some code
yield value1
# More code
yield value2
# And so on

Yield vs Return: Understanding the Difference

To understand what makes yield special, let's compare it with the regular return statement:

python
# Function with return
def return_numbers():
numbers = []
for i in range(1, 6):
numbers.append(i)
return numbers

# Function with yield
def yield_numbers():
for i in range(1, 6):
yield i

# Using return function
print("Using return:")
result = return_numbers()
print(result) # All numbers at once

# Using yield function
print("\nUsing yield:")
gen = yield_numbers()
print(next(gen)) # One number at a time
print(next(gen))
print(next(gen))

Output:

Using return:
[1, 2, 3, 4, 5]

Using yield:
1
2
3

Key differences:

  1. The return function builds the entire list in memory before returning
  2. The yield function generates each value on demand
  3. The generator object maintains its state between calls to next()

How Generators Work Behind the Scenes

When you call a generator function, it doesn't execute the function body immediately. Instead, it returns a generator object that implements the iterator protocol:

python
def simple_generator():
print("First yield")
yield 1
print("Second yield")
yield 2
print("Third yield")
yield 3

# Create generator object
gen = simple_generator()
print(type(gen))

# Nothing is printed until we start iterating
print("\nStarting iteration:")
print(next(gen)) # Executes until first yield
print(next(gen)) # Continues from previous position
print(next(gen)) # Continues again

Output:

<class 'generator'>

Starting iteration:
First yield
1
Second yield
2
Third yield
3

If you call next(gen) one more time, you'll get a StopIteration exception, which is how Python signals the end of an iterator.

Common Use Cases for Yield

1. Processing Large Files

Generators are perfect for reading large files line by line without loading the entire file into memory:

python
def read_large_file(file_path):
with open(file_path, 'r') as file:
for line in file:
yield line.strip()

# Usage example
def count_lines(file_path):
count = 0
for line in read_large_file(file_path):
count += 1
return count

# This processes a large file efficiently without loading it all into memory

2. Infinite Sequences

You can create infinite sequences that would be impossible to store in memory:

python
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b

# Get first 10 fibonacci numbers
fib = fibonacci()
for _ in range(10):
print(next(fib), end=" ")

Output:

0 1 1 2 3 5 8 13 21 34 

3. Pipelining Data Processing

Generators can be used to create data processing pipelines:

python
def read_data(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line.strip()

def parse_data(lines):
for line in lines:
# Assume comma-separated values
yield line.split(',')

def filter_data(records):
for record in records:
if len(record) >= 2 and record[1].isdigit() and int(record[1]) > 30:
yield record

# Usage example (assuming you have a data.csv file)
# pipeline = filter_data(parse_data(read_data('data.csv')))
# for record in pipeline:
# print(record)

Generator Expressions

Similar to list comprehensions, Python provides a concise way to create generators called generator expressions:

python
# List comprehension (creates entire list in memory)
numbers_list = [x*x for x in range(1000000)]

# Generator expression (creates generator object)
numbers_gen = (x*x for x in range(1000000))

# Compare memory usage
import sys
print(f"List size: {sys.getsizeof(numbers_list)} bytes")
print(f"Generator size: {sys.getsizeof(numbers_gen)} bytes")

# We can iterate over the generator
for i, num in enumerate(numbers_gen):
if i < 5:
print(num, end=" ")
else:
break

Output (sizes may vary):

List size: 8448728 bytes
Generator size: 112 bytes
0 1 4 9 16

The memory difference is significant!

Advanced Generator Features

Sending Values to Generators with .send()

Generators can receive values from outside using the .send() method:

python
def echo_generator():
while True:
received = yield
print(f"Received: {received}")

g = echo_generator()
next(g) # Prime the generator
g.send("Hello")
g.send("World")

Output:

Received: Hello
Received: World

Two-way Communication

You can both receive and send values with generators:

python
def compute_average():
count = 0
total = 0
average = 0

while True:
# Yield current average, then receive next value
value = yield average

if value is not None:
count += 1
total += value
average = total / count

# Use the generator
avg_gen = compute_average()
next(avg_gen) # Start the generator (returns 0)

print(avg_gen.send(10)) # Send 10, get average
print(avg_gen.send(20)) # Send 20, get new average
print(avg_gen.send(30)) # Send 30, get new average

Output:

10.0
15.0
20.0

Real-World Example: Batch Processing

Here's a practical example of using generators for batch processing:

python
def data_source(items):
"""Simulate fetching data from a source"""
for item in items:
yield item

def process_batches(data, batch_size=3):
"""Process data in batches"""
batch = []

for item in data:
batch.append(item)

if len(batch) >= batch_size:
yield batch
batch = []

# Don't forget the last incomplete batch
if batch:
yield batch

# Sample data
all_data = range(1, 11) # Numbers 1-10

# Create processing pipeline
source = data_source(all_data)
batches = process_batches(source, batch_size=3)

# Process each batch
for i, batch in enumerate(batches, 1):
print(f"Processing batch {i}: {batch}")
# Do something with the batch

Output:

Processing batch 1: [1, 2, 3]
Processing batch 2: [4, 5, 6]
Processing batch 3: [7, 8, 9]
Processing batch 4: [10]

This approach is memory-efficient even with very large datasets, as it only keeps one batch in memory at a time.

Best Practices for Using Yield

  1. Use generators for large sequences: When dealing with large amounts of data, generators help reduce memory usage
  2. Use generators for calculated sequences: When each item requires calculation, generators compute values on demand
  3. Use generators for infinite sequences: For potentially infinite streams of data (like monitoring systems)
  4. Keep generator functions focused: Each generator should have a single responsibility
  5. Consider using generator expressions for simple cases where a full function isn't needed

Common Pitfalls to Avoid

  1. Trying to reuse generators: Once a generator is exhausted, you need to recreate it to use it again
  2. Accessing generator values by index: Generators don't support indexing - you have to iterate through them
  3. Forgetting that generators are single-use: You can't reset or rewind a generator to the beginning

Summary

The yield statement is a powerful tool in Python that enables you to:

  • Create generator functions that produce values on-demand
  • Process large datasets efficiently with minimal memory usage
  • Build data pipelines that process information incrementally
  • Create infinite sequences that would be impossible with regular collections

Generators represent a fundamental shift in how data is processed: from "compute everything at once" to "compute only what you need, when you need it."

Exercises

  1. Write a generator function that produces the first n prime numbers
  2. Create a generator that reads a CSV file and yields each row as a dictionary
  3. Build a data processing pipeline using multiple generators that:
    • Reads numbers from a file
    • Filters out non-numeric values
    • Converts them to integers
    • Yields only the even numbers
  4. Implement a windowing generator that, given a list and window size, yields overlapping sublists of the specified window size

Additional Resources

Understanding generators and the yield statement opens up new possibilities for efficient data processing and can dramatically improve the performance and readability of your code when working with sequences and streams.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)