C++ Atomic Operations
Introduction
When working with multiple threads in C++, one of the key challenges is managing shared data safely. Regular operations on shared variables can lead to race conditions, where the behavior of your program becomes unpredictable due to the timing of thread execution.
Atomic operations provide a solution to this problem. An atomic operation is one that completes in a single step relative to other threads, with no possibility of interruption. C++ provides atomic types and operations through the <atomic>
header, introduced in C++11, which allows you to perform thread-safe operations without explicit locks.
In this tutorial, you'll learn:
- What atomic operations are and why they're important
- How to use C++'s atomic types
- Common atomic operations and their use cases
- How atomic operations compare to other synchronization methods
Understanding Atomic Operations
What Are Atomic Operations?
An atomic operation appears to happen instantaneously from the perspective of other threads. It cannot be divided into smaller operations that could be interleaved with operations from other threads.
For example, consider this seemingly simple operation:
counter++; // Not atomic!
While this looks like a single operation, it actually involves multiple steps:
- Read the current value of
counter
- Increment the value
- Write the new value back to
counter
If two threads execute this operation concurrently on the same variable, they might both read the same initial value, increment it independently, and then both write back the same incremented value. As a result, the counter would only increase by 1 instead of 2!
Why Atomic Operations Matter
Atomic operations provide guarantees about the visibility and ordering of operations across threads. They help prevent:
- Race conditions: Where results depend on the timing of thread execution
- Torn reads/writes: Where other threads see partially updated values
- Instruction reordering issues: Where compiler or processor optimizations change the order of operations
Basic Atomic Types in C++
C++ provides the std::atomic
template class, which can wrap various primitive types to make operations on them atomic.
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
int main() {
std::atomic<int> counter(0); // Atomic integer initialized to 0
auto increment = [&counter]() {
for (int i = 0; i < 1000; ++i) {
counter++; // Atomic increment operation
}
};
std::vector<std::thread> threads;
for (int i = 0; i < 10; ++i) {
threads.push_back(std::thread(increment));
}
for (auto& thread : threads) {
thread.join();
}
std::cout << "Final counter value: " << counter << std::endl;
return 0;
}
Output:
Final counter value: 10000
In this example, we created 10 threads, each incrementing a shared counter 1000 times. Because we used an atomic counter, the final result is always 10,000. If we had used a regular int
instead, the result would likely be less due to race conditions.
Common Atomic Operations
Store and Load
The most basic atomic operations are storing and loading values:
std::atomic<int> x(0);
// Store operation
x.store(10, std::memory_order_relaxed);
// Load operation
int value = x.load(std::memory_order_relaxed);
Exchange
The exchange operation atomically replaces the value and returns the previous value:
std::atomic<int> x(5);
int previous = x.exchange(10); // x becomes 10, previous gets 5
Compare-and-Exchange
Compare-and-exchange (CAS) is a fundamental atomic operation that compares the current value with an expected value and, if they match, replaces it with a desired value:
#include <atomic>
#include <iostream>
int main() {
std::atomic<int> value(5);
int expected = 5;
bool success = value.compare_exchange_strong(expected, 10);
std::cout << "CAS success: " << std::boolalpha << success << std::endl;
std::cout << "Value: " << value << std::endl;
std::cout << "Expected: " << expected << std::endl;
// Try again with wrong expected value
expected = 5; // But value is now 10
success = value.compare_exchange_strong(expected, 15);
std::cout << "CAS success: " << success << std::endl;
std::cout << "Value: " << value << std::endl;
std::cout << "Expected: " << expected << " (updated to actual value)" << std::endl;
return 0;
}
Output:
CAS success: true
Value: 10
Expected: 5
CAS success: false
Value: 10
Expected: 10 (updated to actual value)
There are two versions of compare-and-exchange:
compare_exchange_strong
: Guarantees to succeed if the current value equals the expected valuecompare_exchange_weak
: May fail spuriously (return false even when the values match), but may be more efficient on some platforms
Atomic Arithmetic Operations
For numeric atomic types, C++ provides atomic versions of common arithmetic operations:
std::atomic<int> x(10);
// Increment and decrement
x++; // Same as x.fetch_add(1) + 1
++x; // Same as x.fetch_add(1) + 1
x--; // Same as x.fetch_sub(1) - 1
--x; // Same as x.fetch_sub(1) - 1
// Addition and subtraction
x += 5; // Same as x.fetch_add(5) + 5
x -= 3; // Same as x.fetch_sub(3) - 3
// Get value before operation
int prev_add = x.fetch_add(2); // Add 2 to x, return previous value
int prev_sub = x.fetch_sub(1); // Subtract 1 from x, return previous value
Memory Ordering
When working with atomic operations, you can specify memory ordering constraints that determine how operations are ordered between threads:
std::atomic<int> x(0);
// Different memory ordering options
x.store(1, std::memory_order_relaxed); // No synchronization
x.store(2, std::memory_order_release); // Release operation
int val = x.load(std::memory_order_acquire); // Acquire operation
int val2 = x.load(std::memory_order_seq_cst); // Sequentially consistent (default)
The common memory ordering options are:
memory_order_relaxed
: No synchronization or ordering constraints, only atomicitymemory_order_consume
: A load operation with consume ordering (rarely used directly)memory_order_acquire
: A load operation with acquire semanticsmemory_order_release
: A store operation with release semanticsmemory_order_acq_rel
: Both acquire and release semanticsmemory_order_seq_cst
: Sequential consistency (default if not specified)
For beginners, using the default memory_order_seq_cst
is usually the safest option until you understand the memory model more deeply.
Practical Example: Thread-Safe Counter
Let's implement a thread-safe counter class using atomic operations:
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
class AtomicCounter {
private:
std::atomic<unsigned long> value;
public:
AtomicCounter() : value(0) {}
void increment() {
value++;
}
void add(unsigned long val) {
value.fetch_add(val);
}
unsigned long get() const {
return value.load();
}
};
int main() {
AtomicCounter counter;
// Create 4 threads, each incrementing the counter 1,000,000 times
const int NUM_THREADS = 4;
const int ITERATIONS = 1000000;
std::vector<std::thread> threads;
for (int i = 0; i < NUM_THREADS; ++i) {
threads.push_back(std::thread([&counter, ITERATIONS]() {
for (int j = 0; j < ITERATIONS; ++j) {
counter.increment();
}
}));
}
// Wait for all threads to finish
for (auto& thread : threads) {
thread.join();
}
std::cout << "Expected count: " << (NUM_THREADS * ITERATIONS) << std::endl;
std::cout << "Actual count: " << counter.get() << std::endl;
return 0;
}
Output:
Expected count: 4000000
Actual count: 4000000
This counter is thread-safe without using any explicit locks, thanks to atomic operations.
Real-World Application: Lock-Free Stack
Here's a more complex example of a lock-free stack implementation using atomic operations:
#include <atomic>
#include <memory>
#include <iostream>
template<typename T>
class LockFreeStack {
private:
struct Node {
T data;
Node* next;
Node(const T& data) : data(data), next(nullptr) {}
};
std::atomic<Node*> head;
public:
LockFreeStack() : head(nullptr) {}
~LockFreeStack() {
// Clean up remaining nodes
Node* current = head.load();
while (current) {
Node* next = current->next;
delete current;
current = next;
}
}
void push(const T& data) {
Node* new_node = new Node(data);
new_node->next = head.load();
// Keep trying until we manage to update the head
while (!head.compare_exchange_weak(new_node->next, new_node)) {
// If it fails, new_node->next will be updated to the current head
}
}
bool pop(T& result) {
Node* old_head = head.load();
// If the stack is empty
if (!old_head) {
return false;
}
// Try to update head to old_head->next
while (old_head && !head.compare_exchange_weak(old_head, old_head->next)) {
// If it fails, old_head will be updated to the current head
}
// If we've got a node, return its value and delete it
if (old_head) {
result = old_head->data;
delete old_head;
return true;
}
return false;
}
bool empty() const {
return head.load() == nullptr;
}
};
int main() {
LockFreeStack<int> stack;
// Push some values
stack.push(1);
stack.push(2);
stack.push(3);
// Pop and print values
int value;
while (stack.pop(value)) {
std::cout << "Popped: " << value << std::endl;
}
return 0;
}
Output:
Popped: 3
Popped: 2
Popped: 1
This implementation uses compare-and-exchange operations to update the stack head atomically. It's lock-free because threads don't have to wait for locks, but can immediately retry operations if they encounter conflicts.
Atomic vs. Mutexes
When should you use atomic operations versus mutexes?
#include <atomic>
#include <mutex>
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
// Using an atomic counter
void atomic_example() {
std::atomic<int> counter(0);
auto start = std::chrono::high_resolution_clock::now();
std::vector<std::thread> threads;
for (int i = 0; i < 10; ++i) {
threads.push_back(std::thread([&counter]() {
for (int j = 0; j < 1000000; ++j) {
counter++;
}
}));
}
for (auto& thread : threads) {
thread.join();
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> duration = end - start;
std::cout << "Atomic counter: " << counter << " (took " << duration.count() << " ms)" << std::endl;
}
// Using a mutex
void mutex_example() {
int counter = 0;
std::mutex mutex;
auto start = std::chrono::high_resolution_clock::now();
std::vector<std::thread> threads;
for (int i = 0; i < 10; ++i) {
threads.push_back(std::thread([&counter, &mutex]() {
for (int j = 0; j < 1000000; ++j) {
std::lock_guard<std::mutex> lock(mutex);
counter++;
}
}));
}
for (auto& thread : threads) {
thread.join();
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> duration = end - start;
std::cout << "Mutex counter: " << counter << " (took " << duration.count() << " ms)" << std::endl;
}
int main() {
atomic_example();
mutex_example();
return 0;
}
Sample output (timing will vary by system):
Atomic counter: 10000000 (took 158.274 ms)
Mutex counter: 10000000 (took 2783.419 ms)
Atomic operations are generally faster for simple operations like incrementing a counter because they don't require the overhead of acquiring and releasing a lock. However, mutexes are better when:
- You need to protect a complex operation involving multiple steps
- You're working with non-atomic data structures
- You need to ensure that a group of operations happens atomically
Common Pitfalls
- Not all operations are atomic: Even when using atomic types, compound operations are not automatically atomic.
std::atomic<int> x(0);
std::atomic<int> y(0);
// This is NOT atomic as a whole
x.store(1);
y.store(2);
-
Cache line sharing: Frequent updates to atomic variables can cause "false sharing" if they're in the same cache line.
-
Overlooking memory ordering: Using the wrong memory ordering can lead to subtle bugs.
-
Assuming atomicity guarantees ordering: Atomicity does not automatically guarantee ordering of operations between threads without proper memory ordering constraints.
Summary
Atomic operations are a powerful tool in C++ multithreading that allow you to perform thread-safe operations without locks. They're especially useful for simple operations on primitive types and can lead to more efficient code than using mutexes.
Key points to remember:
- Atomic operations complete in a single, uninterruptible step
- C++ provides atomic types via the
<atomic>
header - Common operations include load, store, exchange, and compare-exchange
- Memory ordering specifies how operations are ordered between threads
- Atomic operations are typically faster than mutex locks but are limited to simpler operations
By using atomic operations appropriately, you can write more efficient concurrent code with fewer synchronization issues.
Exercises
- Implement a thread-safe, lock-free boolean flag using atomic operations.
- Create a simple spin lock using atomic compare-and-exchange.
- Implement a thread-safe singleton pattern using atomic operations.
- Build a multi-producer, multi-consumer queue using atomic operations.
- Compare the performance of an atomic counter versus a mutex-protected counter with different numbers of threads.
Additional Resources
- C++ Reference: Atomic Operations Library
- C++ Concurrency in Action by Anthony Williams
- CppCon Talks on YouTube about atomics and lock-free programming
- The Art of Multiprocessor Programming for a deeper understanding of concurrent algorithms
Happy coding and threading!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)