Python Locks and Semaphores

Introduction

When working with concurrent programming in Python, you'll often encounter situations where multiple threads or processes need to access shared resources. Without proper coordination, this can lead to race conditions, data corruption, and unpredictable behavior.

Python provides synchronization primitives such as locks and semaphores to help manage these concurrent access scenarios. These tools are essential for writing reliable multithreaded applications.

In this tutorial, we'll explore:

What locks and semaphores are
How to use them in Python
Common patterns and best practices
Real-world applications

Understanding the Problem: Race Conditions

Before diving into locks and semaphores, let's understand why we need them. Consider this simple example where two threads try to increment a counter:

import threading

counter = 0

def increment_counter():
    global counter
    for _ in range(100000):
        counter += 1  # This is not an atomic operation!

# Create two threads
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)

# Start the threads
thread1.start()
thread2.start()

# Wait for both threads to complete
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Expected output: 200000
Actual output: Something less than 200000 (varies each run)

This happens because the operation counter += 1 is not atomic. It involves reading the value, incrementing it, and writing it back. If two threads perform this operation simultaneously, they might both read the same initial value, increment it separately, and then write back the same incremented value, effectively losing one of the increments.

This is a classic race condition, and this is where locks come in!

Locks in Python

A lock (or mutex) is a synchronization primitive that allows only one thread to execute a particular section of code at a time.

Basic Lock Usage

Python's threading module provides a Lock class:

import threading

counter = 0
counter_lock = threading.Lock()

def increment_counter():
    global counter
    for _ in range(100000):
        counter_lock.acquire()  # Lock the resource
        try:
            counter += 1
        finally:
            counter_lock.release()  # Always release the lock

# Create two threads
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)

# Start the threads
thread1.start()
thread2.start()

# Wait for both threads to complete
thread1.join()
thread2.join()

print(f"Final counter value: {counter}")

Output: 200000 (consistently)

With the lock in place, only one thread can increment the counter at a time, preventing the race condition.

Using Locks as Context Managers

A cleaner way to use locks is with the with statement, which ensures the lock is released even if an exception occurs:

def increment_counter():
    global counter
    for _ in range(100000):
        with counter_lock:  # Automatically acquires and releases the lock
            counter += 1

RLock (Reentrant Lock)

Sometimes a thread may need to acquire the same lock multiple times. A regular Lock would cause a deadlock in this scenario, but RLock (reentrant lock) allows a thread to acquire the same lock multiple times:

rlock = threading.RLock()

def reentrant_function():
    with rlock:
        print("First lock acquired")
        # Do something
        with rlock:  # This would cause a deadlock with a regular Lock
            print("Second lock acquired")
        print("Inner lock released")
    print("Outer lock released")

Semaphores in Python

While a lock allows only one thread to access a resource, a semaphore can allow a specified number of threads to access a resource simultaneously.

Basic Semaphore Usage

import threading
import time

# Create a semaphore that allows up to 3 threads at once
semaphore = threading.Semaphore(3)

def worker(id):
    print(f"Worker {id} is trying to access the resource")
    with semaphore:
        print(f"Worker {id} has accessed the resource")
        time.sleep(1)  # Simulate some work
    print(f"Worker {id} has released the resource")

# Create 5 worker threads
threads = []
for i in range(5):
    thread = threading.Thread(target=worker, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()

Output:

Worker 0 is trying to access the resource
Worker 0 has accessed the resource
Worker 1 is trying to access the resource
Worker 1 has accessed the resource
Worker 2 is trying to access the resource
Worker 2 has accessed the resource
Worker 3 is trying to access the resource
Worker 4 is trying to access the resource
Worker 0 has released the resource
Worker 3 has accessed the resource
Worker 1 has released the resource
Worker 4 has accessed the resource
Worker 2 has released the resource
Worker 3 has released the resource
Worker 4 has released the resource

Notice that only 3 workers can access the resource simultaneously. Once one worker finishes, another can access the resource.

BoundedSemaphore

A BoundedSemaphore is like a regular semaphore but raises an error if the semaphore is released more times than it was acquired:

bounded_semaphore = threading.BoundedSemaphore(3)

This is useful for catching programming errors where you might accidentally release a semaphore too many times.

Real-World Applications

Example 1: Connection Pool

Imagine you're writing a program that needs to connect to a database, but you want to limit the number of simultaneous connections:

import threading
import time
import random

class DatabaseConnectionPool:
    def __init__(self, max_connections):
        self.semaphore = threading.BoundedSemaphore(max_connections)
        self.connections = []
        self.lock = threading.Lock()
    
    def get_connection(self):
        with self.semaphore:
            print(f"Thread {threading.current_thread().name} acquired a connection")
            with self.lock:
                # Simulate creating or getting an existing connection
                connection = f"Connection-{random.randint(1, 1000)}"
                self.connections.append(connection)
            return connection
    
    def release_connection(self, connection):
        with self.lock:
            self.connections.remove(connection)
            print(f"Thread {threading.current_thread().name} released connection {connection}")
        self.semaphore.release()

def worker(pool):
    try:
        connection = pool.get_connection()
        # Simulate using the connection
        time.sleep(random.random() * 3)
        pool.release_connection(connection)
    except Exception as e:
        print(f"Error: {e}")

# Create a connection pool with 3 max connections
pool = DatabaseConnectionPool(3)

# Create 10 worker threads
threads = []
for i in range(10):
    thread = threading.Thread(target=worker, args=(pool,), name=f"Worker-{i}")
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()

Example 2: Producer-Consumer Pattern

The producer-consumer pattern is a classic problem in concurrent programming, where producers generate data that consumers process. Semaphores can be used to coordinate them:

import threading
import queue
import time
import random

# Create a thread-safe queue with a maximum size
buffer = queue.Queue(maxsize=5)

# Create semaphores
items = threading.Semaphore(0)  # Counts items in buffer
spaces = threading.Semaphore(5)  # Counts spaces in buffer
buffer_lock = threading.Lock()   # Ensures exclusive access to buffer

def producer():
    for i in range(10):
        item = f"Item-{i}"
        spaces.acquire()  # Wait for space to be available
        with buffer_lock:
            print(f"Producer producing {item}")
            buffer.put(item)
        items.release()   # Signal that an item is available
        time.sleep(random.random())

def consumer():
    for i in range(10):
        items.acquire()   # Wait for an item to be available
        with buffer_lock:
            item = buffer.get()
            print(f"Consumer consuming {item}")
        spaces.release()  # Signal that a space is available
        time.sleep(random.random() * 3)

# Create producer and consumer threads
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# Start the threads
producer_thread.start()
consumer_thread.start()

# Wait for both threads to complete
producer_thread.join()
consumer_thread.join()

Common Pitfalls and Best Practices

Deadlocks: Occur when two or more threads wait forever for a resource held by the other. To avoid:
- Always acquire locks in the same order
- Use timeouts when acquiring locks
- Use context managers (with statement) to ensure locks are released
Lock Starvation: Some threads might never get access to a lock if others keep acquiring it. Consider using queue-based mechanisms for fairness.
Over-synchronization: Too many locks can reduce concurrency and performance. Only protect the critical sections.
Under-synchronization: Missing locks can lead to race conditions. Identify all shared resources.
Not Releasing Locks: Always ensure locks are released, preferably using the with statement.

Summary

In this tutorial, we've covered:

The basics of locks and semaphores in Python
How to use the threading.Lock and threading.Semaphore classes
Reentrant locks with RLock
Real-world applications and patterns
Common pitfalls and best practices

Locks and semaphores are essential tools for concurrent programming in Python. They help you control access to shared resources, prevent race conditions, and build reliable multithreaded applications.

Additional Resources

Python's official documentation on threading
"Python Concurrency with asyncio" by Matthew Fowler - for deeper understanding of modern concurrency
"Python Cookbook" by David Beazley and Brian K. Jones - has excellent recipes for threading and concurrency

Exercises

Implement a thread-safe counter class that uses locks.
Create a resource pool (like a database connection pool) using semaphores.
Solve the dining philosophers problem using locks to prevent deadlocks.
Implement a readers-writers lock that allows multiple simultaneous readers but only one writer.
Modify the producer-consumer example to handle multiple producers and consumers.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Understanding the Problem: Race Conditions​

Locks in Python​

Basic Lock Usage​

Using Locks as Context Managers​

RLock (Reentrant Lock)​

Semaphores in Python​

Basic Semaphore Usage​

BoundedSemaphore​

Real-World Applications​

Example 1: Connection Pool​

Example 2: Producer-Consumer Pattern​

Common Pitfalls and Best Practices​

Summary​

Additional Resources​

Exercises​