C++ Atomic Operations

Introduction

When working with multiple threads in C++, one of the key challenges is managing shared data safely. Regular operations on shared variables can lead to race conditions, where the behavior of your program becomes unpredictable due to the timing of thread execution.

Atomic operations provide a solution to this problem. An atomic operation is one that completes in a single step relative to other threads, with no possibility of interruption. C++ provides atomic types and operations through the <atomic> header, introduced in C++11, which allows you to perform thread-safe operations without explicit locks.

In this tutorial, you'll learn:

What atomic operations are and why they're important
How to use C++'s atomic types
Common atomic operations and their use cases
How atomic operations compare to other synchronization methods

Understanding Atomic Operations

What Are Atomic Operations?

An atomic operation appears to happen instantaneously from the perspective of other threads. It cannot be divided into smaller operations that could be interleaved with operations from other threads.

For example, consider this seemingly simple operation:

cpp
counter++; // Not atomic!

While this looks like a single operation, it actually involves multiple steps:

Read the current value of counter
Increment the value
Write the new value back to counter

If two threads execute this operation concurrently on the same variable, they might both read the same initial value, increment it independently, and then both write back the same incremented value. As a result, the counter would only increase by 1 instead of 2!

Why Atomic Operations Matter

Atomic operations provide guarantees about the visibility and ordering of operations across threads. They help prevent:

Race conditions: Where results depend on the timing of thread execution
Torn reads/writes: Where other threads see partially updated values
Instruction reordering issues: Where compiler or processor optimizations change the order of operations

Basic Atomic Types in C++

C++ provides the std::atomic template class, which can wrap various primitive types to make operations on them atomic.

cpp
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

int main() {
    std::atomic<int> counter(0);  // Atomic integer initialized to 0
    
    auto increment = [&counter]() {
        for (int i = 0; i < 1000; ++i) {
            counter++;  // Atomic increment operation
        }
    };
    
    std::vector<std::thread> threads;
    for (int i = 0; i < 10; ++i) {
        threads.push_back(std::thread(increment));
    }
    
    for (auto& thread : threads) {
        thread.join();
    }
    
    std::cout << "Final counter value: " << counter << std::endl;
    return 0;
}

Output:

Final counter value: 10000

In this example, we created 10 threads, each incrementing a shared counter 1000 times. Because we used an atomic counter, the final result is always 10,000. If we had used a regular int instead, the result would likely be less due to race conditions.

Common Atomic Operations

Store and Load

The most basic atomic operations are storing and loading values:

cpp
std::atomic<int> x(0);

// Store operation
x.store(10, std::memory_order_relaxed);

// Load operation
int value = x.load(std::memory_order_relaxed);

Exchange

The exchange operation atomically replaces the value and returns the previous value:

cpp
std::atomic<int> x(5);
int previous = x.exchange(10);  // x becomes 10, previous gets 5

Compare-and-Exchange

Compare-and-exchange (CAS) is a fundamental atomic operation that compares the current value with an expected value and, if they match, replaces it with a desired value:

cpp
#include <atomic>
#include <iostream>

int main() {
    std::atomic<int> value(5);
    
    int expected = 5;
    bool success = value.compare_exchange_strong(expected, 10);
    
    std::cout << "CAS success: " << std::boolalpha << success << std::endl;
    std::cout << "Value: " << value << std::endl;
    std::cout << "Expected: " << expected << std::endl;
    
    // Try again with wrong expected value
    expected = 5;  // But value is now 10
    success = value.compare_exchange_strong(expected, 15);
    
    std::cout << "CAS success: " << success << std::endl;
    std::cout << "Value: " << value << std::endl;
    std::cout << "Expected: " << expected << " (updated to actual value)" << std::endl;
    
    return 0;
}

Output:

CAS success: true
Value: 10
Expected: 5
CAS success: false
Value: 10
Expected: 10 (updated to actual value)

There are two versions of compare-and-exchange:

compare_exchange_strong: Guarantees to succeed if the current value equals the expected value
compare_exchange_weak: May fail spuriously (return false even when the values match), but may be more efficient on some platforms

Atomic Arithmetic Operations

For numeric atomic types, C++ provides atomic versions of common arithmetic operations:

cpp
std::atomic<int> x(10);

// Increment and decrement
x++;        // Same as x.fetch_add(1) + 1
++x;        // Same as x.fetch_add(1) + 1
x--;        // Same as x.fetch_sub(1) - 1
--x;        // Same as x.fetch_sub(1) - 1

// Addition and subtraction
x += 5;     // Same as x.fetch_add(5) + 5
x -= 3;     // Same as x.fetch_sub(3) - 3

// Get value before operation
int prev_add = x.fetch_add(2);  // Add 2 to x, return previous value
int prev_sub = x.fetch_sub(1);  // Subtract 1 from x, return previous value

Memory Ordering

When working with atomic operations, you can specify memory ordering constraints that determine how operations are ordered between threads:

cpp
std::atomic<int> x(0);

// Different memory ordering options
x.store(1, std::memory_order_relaxed);    // No synchronization
x.store(2, std::memory_order_release);    // Release operation
int val = x.load(std::memory_order_acquire);  // Acquire operation
int val2 = x.load(std::memory_order_seq_cst); // Sequentially consistent (default)

The common memory ordering options are:

memory_order_relaxed: No synchronization or ordering constraints, only atomicity
memory_order_consume: A load operation with consume ordering (rarely used directly)
memory_order_acquire: A load operation with acquire semantics
memory_order_release: A store operation with release semantics
memory_order_acq_rel: Both acquire and release semantics
memory_order_seq_cst: Sequential consistency (default if not specified)

For beginners, using the default memory_order_seq_cst is usually the safest option until you understand the memory model more deeply.

Practical Example: Thread-Safe Counter

Let's implement a thread-safe counter class using atomic operations:

cpp
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

class AtomicCounter {
private:
    std::atomic<unsigned long> value;

public:
    AtomicCounter() : value(0) {}
    
    void increment() {
        value++;
    }
    
    void add(unsigned long val) {
        value.fetch_add(val);
    }
    
    unsigned long get() const {
        return value.load();
    }
};

int main() {
    AtomicCounter counter;
    
    // Create 4 threads, each incrementing the counter 1,000,000 times
    const int NUM_THREADS = 4;
    const int ITERATIONS = 1000000;
    
    std::vector<std::thread> threads;
    for (int i = 0; i < NUM_THREADS; ++i) {
        threads.push_back(std::thread([&counter, ITERATIONS]() {
            for (int j = 0; j < ITERATIONS; ++j) {
                counter.increment();
            }
        }));
    }
    
    // Wait for all threads to finish
    for (auto& thread : threads) {
        thread.join();
    }
    
    std::cout << "Expected count: " << (NUM_THREADS * ITERATIONS) << std::endl;
    std::cout << "Actual count: " << counter.get() << std::endl;
    
    return 0;
}

Output:

Expected count: 4000000
Actual count: 4000000

This counter is thread-safe without using any explicit locks, thanks to atomic operations.

Real-World Application: Lock-Free Stack

Here's a more complex example of a lock-free stack implementation using atomic operations:

cpp
#include <atomic>
#include <memory>
#include <iostream>

template<typename T>
class LockFreeStack {
private:
    struct Node {
        T data;
        Node* next;
        
        Node(const T& data) : data(data), next(nullptr) {}
    };
    
    std::atomic<Node*> head;

public:
    LockFreeStack() : head(nullptr) {}
    
    ~LockFreeStack() {
        // Clean up remaining nodes
        Node* current = head.load();
        while (current) {
            Node* next = current->next;
            delete current;
            current = next;
        }
    }
    
    void push(const T& data) {
        Node* new_node = new Node(data);
        new_node->next = head.load();
        
        // Keep trying until we manage to update the head
        while (!head.compare_exchange_weak(new_node->next, new_node)) {
            // If it fails, new_node->next will be updated to the current head
        }
    }
    
    bool pop(T& result) {
        Node* old_head = head.load();
        
        // If the stack is empty
        if (!old_head) {
            return false;
        }
        
        // Try to update head to old_head->next
        while (old_head && !head.compare_exchange_weak(old_head, old_head->next)) {
            // If it fails, old_head will be updated to the current head
        }
        
        // If we've got a node, return its value and delete it
        if (old_head) {
            result = old_head->data;
            delete old_head;
            return true;
        }
        
        return false;
    }
    
    bool empty() const {
        return head.load() == nullptr;
    }
};

int main() {
    LockFreeStack<int> stack;
    
    // Push some values
    stack.push(1);
    stack.push(2);
    stack.push(3);
    
    // Pop and print values
    int value;
    while (stack.pop(value)) {
        std::cout << "Popped: " << value << std::endl;
    }
    
    return 0;
}

Output:

Popped: 3
Popped: 2
Popped: 1

This implementation uses compare-and-exchange operations to update the stack head atomically. It's lock-free because threads don't have to wait for locks, but can immediately retry operations if they encounter conflicts.

Atomic vs. Mutexes

When should you use atomic operations versus mutexes?

cpp
#include <atomic>
#include <mutex>
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>

// Using an atomic counter
void atomic_example() {
    std::atomic<int> counter(0);
    
    auto start = std::chrono::high_resolution_clock::now();
    
    std::vector<std::thread> threads;
    for (int i = 0; i < 10; ++i) {
        threads.push_back(std::thread([&counter]() {
            for (int j = 0; j < 1000000; ++j) {
                counter++;
            }
        }));
    }
    
    for (auto& thread : threads) {
        thread.join();
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> duration = end - start;
    
    std::cout << "Atomic counter: " << counter << " (took " << duration.count() << " ms)" << std::endl;
}

// Using a mutex
void mutex_example() {
    int counter = 0;
    std::mutex mutex;
    
    auto start = std::chrono::high_resolution_clock::now();
    
    std::vector<std::thread> threads;
    for (int i = 0; i < 10; ++i) {
        threads.push_back(std::thread([&counter, &mutex]() {
            for (int j = 0; j < 1000000; ++j) {
                std::lock_guard<std::mutex> lock(mutex);
                counter++;
            }
        }));
    }
    
    for (auto& thread : threads) {
        thread.join();
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> duration = end - start;
    
    std::cout << "Mutex counter: " << counter << " (took " << duration.count() << " ms)" << std::endl;
}

int main() {
    atomic_example();
    mutex_example();
    return 0;
}

Sample output (timing will vary by system):

Atomic counter: 10000000 (took 158.274 ms)
Mutex counter: 10000000 (took 2783.419 ms)

Atomic operations are generally faster for simple operations like incrementing a counter because they don't require the overhead of acquiring and releasing a lock. However, mutexes are better when:

You need to protect a complex operation involving multiple steps
You're working with non-atomic data structures
You need to ensure that a group of operations happens atomically

Common Pitfalls

Not all operations are atomic: Even when using atomic types, compound operations are not automatically atomic.

cpp
std::atomic<int> x(0);
std::atomic<int> y(0);

// This is NOT atomic as a whole
x.store(1);
y.store(2);

Cache line sharing: Frequent updates to atomic variables can cause "false sharing" if they're in the same cache line.
Overlooking memory ordering: Using the wrong memory ordering can lead to subtle bugs.
Assuming atomicity guarantees ordering: Atomicity does not automatically guarantee ordering of operations between threads without proper memory ordering constraints.

Summary

Atomic operations are a powerful tool in C++ multithreading that allow you to perform thread-safe operations without locks. They're especially useful for simple operations on primitive types and can lead to more efficient code than using mutexes.

Key points to remember:

Atomic operations complete in a single, uninterruptible step
C++ provides atomic types via the <atomic> header
Common operations include load, store, exchange, and compare-exchange
Memory ordering specifies how operations are ordered between threads
Atomic operations are typically faster than mutex locks but are limited to simpler operations

By using atomic operations appropriately, you can write more efficient concurrent code with fewer synchronization issues.

Exercises

Implement a thread-safe, lock-free boolean flag using atomic operations.
Create a simple spin lock using atomic compare-and-exchange.
Implement a thread-safe singleton pattern using atomic operations.
Build a multi-producer, multi-consumer queue using atomic operations.
Compare the performance of an atomic counter versus a mutex-protected counter with different numbers of threads.

Additional Resources

C++ Reference: Atomic Operations Library
C++ Concurrency in Action by Anthony Williams
CppCon Talks on YouTube about atomics and lock-free programming
The Art of Multiprocessor Programming for a deeper understanding of concurrent algorithms

Happy coding and threading!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Atomic Operations​

What Are Atomic Operations?​

Why Atomic Operations Matter​

Basic Atomic Types in C++​

Common Atomic Operations​

Store and Load​

Exchange​

Compare-and-Exchange​

Atomic Arithmetic Operations​

Memory Ordering​

Practical Example: Thread-Safe Counter​

Real-World Application: Lock-Free Stack​

Atomic vs. Mutexes​

Common Pitfalls​

Summary​

Exercises​

Additional Resources​

Introduction

Understanding Atomic Operations

What Are Atomic Operations?

Why Atomic Operations Matter

Basic Atomic Types in C++

Common Atomic Operations

Store and Load

Exchange

Compare-and-Exchange

Atomic Arithmetic Operations

Memory Ordering

Practical Example: Thread-Safe Counter

Real-World Application: Lock-Free Stack

Atomic vs. Mutexes

Common Pitfalls

Summary

Exercises

Additional Resources