Python Locks and Semaphores
Introduction
When working with concurrent programming in Python, you'll often encounter situations where multiple threads or processes need to access shared resources. Without proper coordination, this can lead to race conditions, data corruption, and unpredictable behavior.
Python provides synchronization primitives such as locks and semaphores to help manage these concurrent access scenarios. These tools are essential for writing reliable multithreaded applications.
In this tutorial, we'll explore:
- What locks and semaphores are
- How to use them in Python
- Common patterns and best practices
- Real-world applications
Understanding the Problem: Race Conditions
Before diving into locks and semaphores, let's understand why we need them. Consider this simple example where two threads try to increment a counter:
import threading
counter = 0
def increment_counter():
global counter
for _ in range(100000):
counter += 1 # This is not an atomic operation!
# Create two threads
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)
# Start the threads
thread1.start()
thread2.start()
# Wait for both threads to complete
thread1.join()
thread2.join()
print(f"Final counter value: {counter}")
Expected output: 200000
Actual output: Something less than 200000 (varies each run)
This happens because the operation counter += 1
is not atomic. It involves reading the value, incrementing it, and writing it back. If two threads perform this operation simultaneously, they might both read the same initial value, increment it separately, and then write back the same incremented value, effectively losing one of the increments.
This is a classic race condition, and this is where locks come in!
Locks in Python
A lock (or mutex) is a synchronization primitive that allows only one thread to execute a particular section of code at a time.
Basic Lock Usage
Python's threading
module provides a Lock
class:
import threading
counter = 0
counter_lock = threading.Lock()
def increment_counter():
global counter
for _ in range(100000):
counter_lock.acquire() # Lock the resource
try:
counter += 1
finally:
counter_lock.release() # Always release the lock
# Create two threads
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)
# Start the threads
thread1.start()
thread2.start()
# Wait for both threads to complete
thread1.join()
thread2.join()
print(f"Final counter value: {counter}")
Output: 200000 (consistently)
With the lock in place, only one thread can increment the counter at a time, preventing the race condition.
Using Locks as Context Managers
A cleaner way to use locks is with the with
statement, which ensures the lock is released even if an exception occurs:
def increment_counter():
global counter
for _ in range(100000):
with counter_lock: # Automatically acquires and releases the lock
counter += 1
RLock (Reentrant Lock)
Sometimes a thread may need to acquire the same lock multiple times. A regular Lock
would cause a deadlock in this scenario, but RLock
(reentrant lock) allows a thread to acquire the same lock multiple times:
rlock = threading.RLock()
def reentrant_function():
with rlock:
print("First lock acquired")
# Do something
with rlock: # This would cause a deadlock with a regular Lock
print("Second lock acquired")
print("Inner lock released")
print("Outer lock released")
Semaphores in Python
While a lock allows only one thread to access a resource, a semaphore can allow a specified number of threads to access a resource simultaneously.
Basic Semaphore Usage
import threading
import time
# Create a semaphore that allows up to 3 threads at once
semaphore = threading.Semaphore(3)
def worker(id):
print(f"Worker {id} is trying to access the resource")
with semaphore:
print(f"Worker {id} has accessed the resource")
time.sleep(1) # Simulate some work
print(f"Worker {id} has released the resource")
# Create 5 worker threads
threads = []
for i in range(5):
thread = threading.Thread(target=worker, args=(i,))
threads.append(thread)
thread.start()
# Wait for all threads to complete
for thread in threads:
thread.join()
Output:
Worker 0 is trying to access the resource
Worker 0 has accessed the resource
Worker 1 is trying to access the resource
Worker 1 has accessed the resource
Worker 2 is trying to access the resource
Worker 2 has accessed the resource
Worker 3 is trying to access the resource
Worker 4 is trying to access the resource
Worker 0 has released the resource
Worker 3 has accessed the resource
Worker 1 has released the resource
Worker 4 has accessed the resource
Worker 2 has released the resource
Worker 3 has released the resource
Worker 4 has released the resource
Notice that only 3 workers can access the resource simultaneously. Once one worker finishes, another can access the resource.
BoundedSemaphore
A BoundedSemaphore
is like a regular semaphore but raises an error if the semaphore is released more times than it was acquired:
bounded_semaphore = threading.BoundedSemaphore(3)
This is useful for catching programming errors where you might accidentally release a semaphore too many times.
Real-World Applications
Example 1: Connection Pool
Imagine you're writing a program that needs to connect to a database, but you want to limit the number of simultaneous connections:
import threading
import time
import random
class DatabaseConnectionPool:
def __init__(self, max_connections):
self.semaphore = threading.BoundedSemaphore(max_connections)
self.connections = []
self.lock = threading.Lock()
def get_connection(self):
with self.semaphore:
print(f"Thread {threading.current_thread().name} acquired a connection")
with self.lock:
# Simulate creating or getting an existing connection
connection = f"Connection-{random.randint(1, 1000)}"
self.connections.append(connection)
return connection
def release_connection(self, connection):
with self.lock:
self.connections.remove(connection)
print(f"Thread {threading.current_thread().name} released connection {connection}")
self.semaphore.release()
def worker(pool):
try:
connection = pool.get_connection()
# Simulate using the connection
time.sleep(random.random() * 3)
pool.release_connection(connection)
except Exception as e:
print(f"Error: {e}")
# Create a connection pool with 3 max connections
pool = DatabaseConnectionPool(3)
# Create 10 worker threads
threads = []
for i in range(10):
thread = threading.Thread(target=worker, args=(pool,), name=f"Worker-{i}")
threads.append(thread)
thread.start()
# Wait for all threads to complete
for thread in threads:
thread.join()
Example 2: Producer-Consumer Pattern
The producer-consumer pattern is a classic problem in concurrent programming, where producers generate data that consumers process. Semaphores can be used to coordinate them:
import threading
import queue
import time
import random
# Create a thread-safe queue with a maximum size
buffer = queue.Queue(maxsize=5)
# Create semaphores
items = threading.Semaphore(0) # Counts items in buffer
spaces = threading.Semaphore(5) # Counts spaces in buffer
buffer_lock = threading.Lock() # Ensures exclusive access to buffer
def producer():
for i in range(10):
item = f"Item-{i}"
spaces.acquire() # Wait for space to be available
with buffer_lock:
print(f"Producer producing {item}")
buffer.put(item)
items.release() # Signal that an item is available
time.sleep(random.random())
def consumer():
for i in range(10):
items.acquire() # Wait for an item to be available
with buffer_lock:
item = buffer.get()
print(f"Consumer consuming {item}")
spaces.release() # Signal that a space is available
time.sleep(random.random() * 3)
# Create producer and consumer threads
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)
# Start the threads
producer_thread.start()
consumer_thread.start()
# Wait for both threads to complete
producer_thread.join()
consumer_thread.join()
Common Pitfalls and Best Practices
-
Deadlocks: Occur when two or more threads wait forever for a resource held by the other. To avoid:
- Always acquire locks in the same order
- Use timeouts when acquiring locks
- Use context managers (
with
statement) to ensure locks are released
-
Lock Starvation: Some threads might never get access to a lock if others keep acquiring it. Consider using queue-based mechanisms for fairness.
-
Over-synchronization: Too many locks can reduce concurrency and performance. Only protect the critical sections.
-
Under-synchronization: Missing locks can lead to race conditions. Identify all shared resources.
-
Not Releasing Locks: Always ensure locks are released, preferably using the
with
statement.
Summary
In this tutorial, we've covered:
- The basics of locks and semaphores in Python
- How to use the
threading.Lock
andthreading.Semaphore
classes - Reentrant locks with
RLock
- Real-world applications and patterns
- Common pitfalls and best practices
Locks and semaphores are essential tools for concurrent programming in Python. They help you control access to shared resources, prevent race conditions, and build reliable multithreaded applications.
Additional Resources
- Python's official documentation on threading
- "Python Concurrency with asyncio" by Matthew Fowler - for deeper understanding of modern concurrency
- "Python Cookbook" by David Beazley and Brian K. Jones - has excellent recipes for threading and concurrency
Exercises
- Implement a thread-safe counter class that uses locks.
- Create a resource pool (like a database connection pool) using semaphores.
- Solve the dining philosophers problem using locks to prevent deadlocks.
- Implement a readers-writers lock that allows multiple simultaneous readers but only one writer.
- Modify the producer-consumer example to handle multiple producers and consumers.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)