Python Threading Basics

Introduction

In the world of programming, there often comes a time when you need to perform multiple tasks simultaneously. Imagine downloading several files while also processing data or updating a user interface. This is where threading comes into play.

Threading is a technique that allows a program to execute multiple operations concurrently within a single process. In Python, the threading module provides a way to create and manage threads, enabling you to write programs that can do several things at once.

In this tutorial, you'll learn:

What threads are and why they're useful
How to create and manage threads in Python
Common threading patterns and best practices
How to avoid common pitfalls with threading

Understanding Threads

What is a Thread?

A thread is the smallest unit of execution within a process. When you run a Python program, you're starting a process that, by default, contains a single thread (the main thread). This thread executes your code sequentially, one statement at a time.

By creating additional threads, you can have multiple sequences of instructions executing concurrently, allowing your program to perform multiple tasks seemingly at the same time.

Why Use Threads?

Threads are particularly useful for:

I/O-bound tasks: Operations that spend time waiting for input/output (like reading files, network requests)
Responsive user interfaces: Keeping UI responsive while performing background tasks
Parallel processing: Utilizing multiple CPU cores for computation (though with limitations in Python, as we'll discuss)

Getting Started with Python Threading

To use threading in Python, you first need to import the threading module:

python
import threading

Creating a Simple Thread

The simplest way to create a thread is to instantiate a Thread object with a target function:

python
import threading
import time

def print_numbers():
    for i in range(1, 6):
        time.sleep(1)
        print(f"Number {i}")

# Create a thread
thread = threading.Thread(target=print_numbers)

# Start the thread
thread.start()

print("This will print immediately!")
print("Main thread continues execution...")

# Wait for the thread to finish
thread.join()

print("Thread has finished execution!")

Output:

This will print immediately!
Main thread continues execution...
Number 1
Number 2
Number 3
Number 4
Number 5
Thread has finished execution!

Let's understand what's happening:

We define a function print_numbers() that prints numbers 1-5 with a 1-second delay between each
We create a new thread that will execute this function
We start the thread with thread.start()
The main thread continues executing the next lines immediately
We call thread.join() to wait for our thread to complete before continuing with the main thread
The thread executes in parallel with the main thread until it completes

Creating Multiple Threads

We can create multiple threads to perform different tasks concurrently:

python
import threading
import time

def print_numbers():
    for i in range(1, 6):
        time.sleep(1)
        print(f"Number {i}")

def print_letters():
    for letter in 'abcde':
        time.sleep(1.5)
        print(f"Letter {letter}")

# Create threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

# Start threads
thread1.start()
thread2.start()

print("Both threads are running...")

# Wait for both threads to finish
thread1.join()
thread2.join()

print("Both threads have finished execution!")

Output:

Both threads are running...
Number 1
Letter a
Number 2
Number 3
Letter b
Number 4
Letter c
Number 5
Letter d
Letter e
Both threads have finished execution!

Notice how the numbers and letters interleave in the output because the two threads are running concurrently.

Passing Arguments to Threads

You can pass arguments to the target function using the args parameter (for positional arguments) or kwargs parameter (for keyword arguments):

python
import threading
import time

def greet(name, delay):
    time.sleep(delay)
    print(f"Hello, {name}!")

# Create threads with arguments
thread1 = threading.Thread(target=greet, args=("Alice", 1))
thread2 = threading.Thread(target=greet, args=("Bob", 2))
thread3 = threading.Thread(target=greet, kwargs={"name": "Charlie", "delay": 3})

# Start all threads
thread1.start()
thread2.start()
thread3.start()

print("All greetings scheduled...")

# Wait for all threads to complete
thread1.join()
thread2.join()
thread3.join()

print("All threads have finished!")

Output:

All greetings scheduled...
Hello, Alice!
Hello, Bob!
Hello, Charlie!
All threads have finished!

Creating Thread Subclasses

For more complex threading scenarios, you can subclass the Thread class and override its run() method:

python
import threading
import time

class CountdownThread(threading.Thread):
    def __init__(self, name, count):
        super().__init__()
        self.name = name
        self.count = count
    
    def run(self):
        print(f"Starting {self.name}")
        for i in range(self.count, 0, -1):
            print(f"{self.name}: {i}")
            time.sleep(1)
        print(f"{self.name} finished!")

# Create thread instances
thread1 = CountdownThread("Thread 1", 5)
thread2 = CountdownThread("Thread 2", 3)

# Start threads
thread1.start()
thread2.start()

# Wait for threads to complete
thread1.join()
thread2.join()

print("All countdown threads have finished!")

Output:

Starting Thread 1
Thread 1: 5
Starting Thread 2
Thread 2: 3
Thread 1: 4
Thread 2: 2
Thread 1: 3
Thread 2: 1
Thread 2 finished!
Thread 1: 2
Thread 1: 1
Thread 1 finished!
All countdown threads have finished!

Thread Synchronization

When multiple threads access shared resources, problems like race conditions can occur. To prevent these issues, Python provides several synchronization primitives.

Using Locks

The simplest synchronization tool is the Lock:

python
import threading
import time

# Shared resource
counter = 0
counter_lock = threading.Lock()

def increment_counter(count):
    global counter
    
    for _ in range(count):
        # Acquire the lock before accessing the shared resource
        counter_lock.acquire()
        try:
            # Critical section - only one thread can execute this at a time
            current_value = counter
            time.sleep(0.001)  # Simulate some processing time
            counter = current_value + 1
        finally:
            # Always release the lock, even if an exception occurs
            counter_lock.release()

# Create and start threads
threads = []
for _ in range(5):
    thread = threading.Thread(target=increment_counter, args=(10,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print(f"Final counter value: {counter}")

Output:

Final counter value: 50

Without the lock, the final counter value would likely be less than 50 due to race conditions.

Using the `with` Statement for Locks

Python's with statement provides a cleaner way to use locks:

python
import threading
import time

counter = 0
counter_lock = threading.Lock()

def increment_counter(count):
    global counter
    
    for _ in range(count):
        # Using with statement - automatically acquires and releases the lock
        with counter_lock:
            current_value = counter
            time.sleep(0.001)  # Simulate some processing time
            counter = current_value + 1

# Create and start threads
threads = []
for _ in range(5):
    thread = threading.Thread(target=increment_counter, args=(10,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print(f"Final counter value: {counter}")

Output:

Final counter value: 50

Other Synchronization Primitives

Python's threading module provides several other synchronization primitives:

RLock (Reentrant Lock): A lock that can be acquired multiple times by the same thread
Semaphore: Limits access to a shared resource to a specified number of threads
Event: Allows one thread to signal an event and other threads to wait for it
Condition: Provides a way to notify waiting threads when a condition changes
Barrier: Ensures multiple threads wait for each other at a certain point

Real-World Example: Multi-threaded Web Scraper

Let's create a simple web scraper that downloads multiple web pages concurrently:

python
import threading
import requests
import time
from urllib.parse import urlparse

class WebsiteDownloader(threading.Thread):
    def __init__(self, url):
        super().__init__()
        self.url = url
        self.result = None
        
    def run(self):
        print(f"Downloading {self.url}")
        try:
            response = requests.get(self.url, timeout=10)
            domain = urlparse(self.url).netloc
            self.result = (domain, len(response.text), response.status_code)
            print(f"Finished downloading {self.url}")
        except Exception as e:
            print(f"Error downloading {self.url}: {e}")
            self.result = (self.url, 0, 0)

# List of websites to download
websites = [
    "https://www.python.org",
    "https://www.google.com",
    "https://www.github.com",
    "https://www.stackoverflow.com",
    "https://www.wikipedia.org"
]

# Record start time
start_time = time.time()

# Create and start threads
downloaders = []
for url in websites:
    downloader = WebsiteDownloader(url)
    downloaders.append(downloader)
    downloader.start()

# Wait for all downloads to complete
for downloader in downloaders:
    downloader.join()

# Collect and display results
results = [downloader.result for downloader in downloaders]
results.sort(key=lambda x: x[1], reverse=True)  # Sort by content length

print("\nResults:")
print("=" * 50)
print(f"{'Domain':<25} {'Content Length':<15} {'Status':<10}")
print("-" * 50)
for domain, length, status in results:
    print(f"{domain:<25} {length:<15,} {status:<10}")

# Calculate and display total time
total_time = time.time() - start_time
print("\nTotal execution time: {:.2f} seconds".format(total_time))

# For comparison, calculate time for sequential downloads
print("\nSimulating sequential download time...")
seq_start_time = time.time()

for url in websites:
    try:
        print(f"Downloading {url}")
        response = requests.get(url, timeout=10)
    except Exception as e:
        print(f"Error downloading {url}: {e}")

seq_total_time = time.time() - seq_start_time
print("Sequential execution time: {:.2f} seconds".format(seq_total_time))
print(f"Threading speedup: {seq_total_time/total_time:.2f}x faster")

Output (actual timings and content lengths will vary):

Downloading https://www.python.org
Downloading https://www.google.com
Downloading https://www.github.com
Downloading https://www.stackoverflow.com
Downloading https://www.wikipedia.org
Finished downloading https://www.google.com
Finished downloading https://www.wikipedia.org
Finished downloading https://www.python.org
Finished downloading https://www.github.com
Finished downloading https://www.stackoverflow.com

Results:
==================================================
Domain                    Content Length    Status    
--------------------------------------------------
www.stackoverflow.com     1,265,398         200       
www.github.com            458,972           200       
www.python.org            127,593           200       
www.wikipedia.org         75,463            200       
www.google.com            17,677            200       

Total execution time: 1.78 seconds

Simulating sequential download time...
Downloading https://www.python.org
Downloading https://www.google.com
Downloading https://www.github.com
Downloading https://www.stackoverflow.com
Downloading https://www.wikipedia.org
Sequential execution time: 4.56 seconds
Threading speedup: 2.56x faster

This example demonstrates a significant performance improvement when downloading multiple web pages concurrently using threads.

Common Threading Issues and Best Practices

The Global Interpreter Lock (GIL)

Python's Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously. This means:

Threading in Python is most effective for I/O-bound tasks (like network or file operations)
CPU-bound tasks (heavy computations) may not see performance improvements with threading
For CPU-bound tasks, consider using the multiprocessing module instead

Deadlocks

Deadlocks occur when two or more threads are waiting for each other, causing the program to hang indefinitely:

python
import threading

# Create two locks
lock1 = threading.Lock()
lock2 = threading.Lock()

def thread_1_function():
    with lock1:
        print("Thread 1 acquired lock1")
        # Simulate some work
        import time
        time.sleep(0.1)
        print("Thread 1 waiting for lock2")
        with lock2:
            print("Thread 1 acquired both locks")

def thread_2_function():
    with lock2:
        print("Thread 2 acquired lock2")
        # Simulate some work
        import time
        time.sleep(0.1)
        print("Thread 2 waiting for lock1")
        with lock1:
            print("Thread 2 acquired both locks")

# Create and start threads
thread1 = threading.Thread(target=thread_1_function)
thread2 = threading.Thread(target=thread_2_function)

thread1.start()
thread2.start()

# This line might never get executed due to deadlock
thread1.join()
thread2.join()
print("Both threads completed successfully")

To avoid deadlocks:

Always acquire locks in the same order in all threads
Use timeouts when acquiring locks
Consider higher-level synchronization mechanisms

Best Practices for Threading in Python

Use threading for I/O-bound tasks: Network requests, file operations, etc.
Keep critical sections small: Minimize the code protected by locks
Avoid complex lock hierarchies: Simple locking schemes are less error-prone
Consider thread pools: Use concurrent.futures.ThreadPoolExecutor for managing worker threads
Plan for thread safety: Make your data structures thread-safe or use proper synchronization
Prefer higher-level abstractions: When possible, use higher-level modules like concurrent.futures or queue

Summary

In this tutorial, you've learned:

What threads are and how they enable concurrent execution in Python
How to create and manage threads using Python's threading module
Different ways to pass arguments to threads and create custom thread subclasses
How to synchronize threads using locks and other primitives
A real-world example of using threads for concurrent web scraping
Common threading issues and best practices

Threading is a powerful tool for improving the performance of I/O-bound applications in Python. While the GIL limits the effectiveness of threading for CPU-bound tasks, properly designed multi-threaded programs can significantly improve responsiveness and throughput for many real-world applications.

Additional Resources and Exercises

Exercises

Basic Thread Exercise: Create a program that uses threading to count up and down simultaneously, printing the numbers to the console.
Producer-Consumer Pattern: Implement a producer-consumer pattern using the queue.Queue class, where one thread produces items and another consumes them.
Image Processing: Write a program that loads a folder of images and applies filters to each image using multiple threads.
Thread Pool Exercise: Modify the web scraper example to use concurrent.futures.ThreadPoolExecutor instead of managing threads manually.
Threading vs. Multiprocessing Benchmark: Create a benchmark that compares the performance of threading vs. multiprocessing for both I/O-bound and CPU-bound tasks.

Remember that practice is key to mastering threading concepts. Start with simple examples and gradually move to more complex applications as you become more comfortable with the concepts.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Threads​

What is a Thread?​

Why Use Threads?​

Getting Started with Python Threading​

Creating a Simple Thread​

Creating Multiple Threads​

Passing Arguments to Threads​

Creating Thread Subclasses​

Thread Synchronization​

Using Locks​

Using the with Statement for Locks​

Other Synchronization Primitives​

Real-World Example: Multi-threaded Web Scraper​

Common Threading Issues and Best Practices​

The Global Interpreter Lock (GIL)​

Deadlocks​

Best Practices for Threading in Python​

Summary​

Additional Resources and Exercises​

Further Reading​

Exercises​

Introduction

Understanding Threads

What is a Thread?

Why Use Threads?

Getting Started with Python Threading

Creating a Simple Thread

Creating Multiple Threads

Passing Arguments to Threads

Creating Thread Subclasses

Thread Synchronization

Using Locks

Using the `with` Statement for Locks

Other Synchronization Primitives

Real-World Example: Multi-threaded Web Scraper

Common Threading Issues and Best Practices

The Global Interpreter Lock (GIL)

Deadlocks

Best Practices for Threading in Python

Summary

Additional Resources and Exercises

Further Reading

Exercises