C++ Concurrency
- Description: A note on the C++ concurrency primitives —
std::async, futures, condition variables, atomics, andstd::call_once - My Notion Note ID: K2A-B1-25
- Created: 2018-11-09
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1.
std::asyncand Launch Policies - 2. Futures and Promises
- 3.
std::condition_variable - 4. Read-Write Lock
- 5. Atomic Types
- 6.
std::once_flagandstd::call_once - 7.
std::jthreadand Stop Tokens (C++20) - 8. Latches, Barriers, and Semaphores (C++20)
- 9.
thread_localStorage
C++11 introduced a comprehensive threading library. These notes cover the key concurrency primitives and patterns.
1. std::async and Launch Policies
std::async runs a callable asynchronously (or deferred) and returns a std::future that holds the result. There are three launch policies:
| Policy | Behavior |
|---|---|
std::launch::async |
Guaranteed to run in a new thread. The callable starts executing immediately. |
std::launch::deferred |
Lazy evaluation. The callable is not executed until .get() or .wait() is called on the future. Runs in the calling thread. |
std::launch::async | std::launch::deferred |
Default policy. The implementation decides whether to run asynchronously or deferred, based on system load. You have no control over which. |
#include <future>
#include <iostream>
int compute(int x) { return x * x; }
int main() {
// Guaranteed async execution in a separate thread
auto f1 = std::async(std::launch::async, compute, 42);
// Deferred: only runs when we call .get()
auto f2 = std::async(std::launch::deferred, compute, 10);
// Default: implementation decides
auto f3 = std::async(compute, 7);
std::cout << f1.get() << std::endl; // 1764
std::cout << f2.get() << std::endl; // 100 (executes here)
std::cout << f3.get() << std::endl; // 49
return 0;
}
Caution: With the default policy, std::async may never execute the task if you never call .get() or .wait(). If you need guaranteed background execution, explicitly use std::launch::async.
2. Futures and Promises
std::future and std::promise form a one-shot communication channel between threads.
std::promise— The "writing" end. A thread sets a value (or an exception) on the promise.std::future— The "reading" end. Another thread waits for and retrieves the value.
#include <future>
#include <thread>
#include <iostream>
int main() {
std::promise<int> prom;
std::future<int> fut = prom.get_future();
// Producer thread
std::thread producer([&prom]() {
// Do some work...
prom.set_value(42);
});
// Consumer: blocks until the value is ready
std::cout << "Result: " << fut.get() << std::endl; // 42
producer.join();
return 0;
}
2.1 std::shared_future
std::future::get() can only be called once. If multiple threads need to read the same value, use std::shared_future:
std::promise<int> prom;
std::shared_future<int> sf = prom.get_future().share();
std::thread a([sf] { std::cout << "got " << sf.get(); });
std::thread b([sf] { std::cout << "got " << sf.get(); });
prom.set_value(42); // both threads wake up; both can call sf.get()
a.join();
b.join();
2.2 std::packaged_task
std::packaged_task<Sig> wraps a callable and exposes its return value as a future. Useful for thread pools and deferred execution.
std::packaged_task<int(int, int)> task(
[](int a, int b) { return a + b; });
std::future<int> fut = task.get_future();
std::thread(std::move(task), 2, 3).detach(); // run on a separate thread
std::cout << fut.get(); // 5
A thread pool typically holds a queue of packaged_tasks; workers pop one, invoke it, and the future automatically holds the result for the submitter to retrieve.
3. std::condition_variable
Condition variables allow threads to wait until a particular condition becomes true, without busy-waiting. They must always be used with a std::mutex and a predicate (condition check).
#include <mutex>
#include <condition_variable>
#include <queue>
#include <thread>
#include <iostream>
std::mutex mtx;
std::condition_variable cv;
std::queue<int> data_queue;
bool done = false;
void producer() {
for (int i = 0; i < 5; ++i) {
{
std::lock_guard<std::mutex> lock(mtx);
data_queue.push(i);
}
cv.notify_one();
}
{
std::lock_guard<std::mutex> lock(mtx);
done = true;
}
cv.notify_one();
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [] { return !data_queue.empty() || done; });
while (!data_queue.empty()) {
std::cout << "Got: " << data_queue.front() << std::endl;
data_queue.pop();
}
if (done) break;
}
}
int main() {
std::thread t1(producer);
std::thread t2(consumer);
t1.join();
t2.join();
return 0;
}
3.1 Spurious Wakeups
A condition variable may wake up even when no notification was sent. This is called a spurious wakeup and is a known behavior of most operating systems' threading implementations.
This is why you must always use a predicate (loop condition) with condition_variable::wait():
// WRONG: may proceed without the condition being true
cv.wait(lock);
// CORRECT: rechecks the condition after every wakeup
cv.wait(lock, [] { return condition_is_met; });
// Equivalent to:
// while (!condition_is_met) { cv.wait(lock); }
4. Read-Write Lock
A read-write lock allows multiple concurrent readers but exclusive access for writers. C++17 provides std::shared_mutex for this, but here is a manual implementation using std::mutex and std::condition_variable to illustrate the concept:
#include <mutex>
#include <condition_variable>
class RWLock {
public:
RWLock() = default;
void lockRead() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until no active writer
read_cv_.wait(lock, [this] { return !writer_active_; });
++readers_;
}
void unlockRead() {
std::unique_lock<std::mutex> lock(mutex_);
--readers_;
// If this was the last reader, wake one waiting writer (if any)
if (readers_ == 0) {
write_cv_.notify_one();
}
}
void lockWrite() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until no active readers AND no active writer
write_cv_.wait(lock, [this] {
return readers_ == 0 && !writer_active_;
});
writer_active_ = true;
}
void unlockWrite() {
std::unique_lock<std::mutex> lock(mutex_);
writer_active_ = false;
// Prefer waking another writer if any are queued; otherwise wake all readers
write_cv_.notify_one();
read_cv_.notify_all();
}
private:
std::mutex mutex_;
std::condition_variable read_cv_;
std::condition_variable write_cv_;
int readers_ = 0;
bool writer_active_ = false;
};
Modern alternative (C++17): Use std::shared_mutex with std::shared_lock (for readers) and std::unique_lock (for writers):
#include <shared_mutex>
std::shared_mutex rw_mutex;
void reader() {
std::shared_lock lock(rw_mutex); // Multiple readers allowed
// ... read data ...
}
void writer() {
std::unique_lock lock(rw_mutex); // Exclusive access
// ... write data ...
}
5. Atomic Types
std::atomic<T> (C++11) provides lock-free (or at least thread-safe) operations on a variable without needing a mutex. Atomics are essential for low-level concurrent data structures and flags.
#include <atomic>
#include <thread>
#include <iostream>
std::atomic<int> counter{0};
void increment(int n) {
for (int i = 0; i < n; ++i) {
++counter; // Atomic increment; thread-safe without a mutex
}
}
int main() {
std::thread t1(increment, 100000);
std::thread t2(increment, 100000);
t1.join();
t2.join();
std::cout << counter.load() << std::endl; // Always 200000
return 0;
}
5.1 Common Atomic Operations
- Load and store —
load()andstore(v)for atomic read/write. - Exchange —
exchange(v)atomically replaces the value and returns the old one. - Compare-exchange (CAS) —
compare_exchange_weak/strong(expected, desired). The building block of lock-free algorithms. - Fetch-and-modify —
fetch_add,fetch_sub,fetch_and,fetch_or,fetch_xor. All return the old value (before the update) and operate atomically. - Operator overloads —
++a,--a,a += n,a -= n,a |= n, etc. are syntactic sugar over the fetch-* operations and return the new value.
#include <atomic>
std::atomic<int> n{10};
// Atomic add — fetch_add returns the OLD value
int old_val = n.fetch_add(5); // n is now 15; old_val is 10
int new_val = ++n; // n is now 16; new_val is 16
n += 3; // n is now 19
// Atomic CAS — the building block of lock-free algorithms
int expected = 19;
bool ok = n.compare_exchange_strong(expected, 100);
// If n == expected (19), n is set to 100 and ok = true.
// Otherwise, expected is updated to n's actual current value and ok = false.
// Atomic flag and bitwise operations
std::atomic<unsigned> flags{0};
flags.fetch_or(0b0001); // flags = 0b0001
flags.fetch_xor(0b0011); // flags = 0b0010
weak vs strong CAS. compare_exchange_weak may spuriously fail even when n == expected — but it is faster on platforms with weakly-ordered hardware (ARM, POWER). Use weak inside a retry loop (you'll retry on failure anyway) and strong when you only intend to attempt the swap once.
// Typical CAS retry loop: atomically double n
int expected = n.load();
int desired;
do {
desired = expected * 2;
} while (!n.compare_exchange_weak(expected, desired));
// On failure, compare_exchange_weak refreshes 'expected' with n's current value.
5.2 Memory Ordering
Every atomic operation takes an optional std::memory_order parameter that controls how reads and writes around it are ordered with respect to other threads.
| Order | Use case |
|---|---|
memory_order_relaxed |
Atomicity only; no ordering guarantees. Use for counters where ordering doesn't matter. |
memory_order_acquire (load), memory_order_release (store) |
Pair them for producer/consumer handoff. Reads after an acquire see writes that happened before the matching release. |
memory_order_acq_rel |
For read-modify-write operations (fetch_add etc.) that need both acquire and release semantics. |
memory_order_seq_cst |
Default. Sequentially consistent — strongest, simplest to reason about, but most expensive. |
std::atomic<bool> ready{false};
int data = 0;
// Producer
data = 42;
ready.store(true, std::memory_order_release);
// Consumer
while (!ready.load(std::memory_order_acquire)) { }
// After the acquire, 'data' is guaranteed to be 42.
assert(data == 42);
Default to memory_order_seq_cst unless you have profiled and need to relax it. Mixing memory orders incorrectly is the easiest way to write subtly buggy lock-free code.
See cppreference: std::atomic for the full API and cppreference: memory_order for ordering details.
6. std::once_flag and std::call_once
std::call_once guarantees that a callable is executed exactly once, even if called from multiple threads. This is commonly used for lazy initialization of shared resources.
#include <mutex>
#include <iostream>
#include <thread>
std::once_flag init_flag;
int* shared_resource = nullptr;
void initResource() {
shared_resource = new int(42);
std::cout << "Resource initialized" << std::endl;
}
void worker() {
std::call_once(init_flag, initResource);
std::cout << "Resource value: " << *shared_resource << std::endl;
}
int main() {
std::thread t1(worker);
std::thread t2(worker);
std::thread t3(worker);
t1.join();
t2.join();
t3.join();
// "Resource initialized" is printed exactly once
delete shared_resource;
return 0;
}
7. std::jthread and Stop Tokens (C++20)
std::thread has two awkward properties: it terminates the program if you destruct it without joining, and it has no built-in cancellation mechanism. std::jthread (C++20) fixes both.
#include <thread>
#include <stop_token>
#include <iostream>
void worker(std::stop_token st) {
while (!st.stop_requested()) {
// do work
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout << "stopping cleanly\n";
}
void example() {
std::jthread t(worker); // takes a stop_token automatically
std::this_thread::sleep_for(std::chrono::seconds(1));
// jthread's destructor calls request_stop() and join() automatically
}
Compared to std::thread:
- Auto-joining destructor. No more
std::terminateif you forget to calljoin(). - Built-in
stop_token. The thread can cooperatively exit early. - Same API otherwise.
For cancellation, the worker checks stop_requested() in its loop. For interruptible waits, use stop_callback to wake up a condition_variable_any:
std::condition_variable_any cv;
std::stop_callback cb(stop_token, [&] { cv.notify_all(); });
cv.wait(lock, stop_token, [] { return ready; }); // wakes on ready or stop
8. Latches, Barriers, and Semaphores (C++20)
C++20 added three high-level synchronization primitives.
8.1 std::latch
A one-shot counter. Threads call count_down() to decrement, wait() to block until it reaches zero. Cannot be reused.
#include <latch>
#include <thread>
#include <vector>
std::latch start{1}; // initial count
std::latch finish{N}; // wait for N workers
std::vector<std::jthread> ts;
for (int i = 0; i < N; ++i) {
ts.emplace_back([&] {
start.wait(); // block until main signals start
do_work();
finish.count_down();
});
}
start.count_down(); // release all workers
finish.wait(); // wait for all to finish
8.2 std::barrier
Like latch but reusable. After all threads arrive, the barrier resets to its initial count, optionally running a completion function.
#include <barrier>
std::barrier sync_point{N, [] {
// runs once when all N arrive, before they're released
std::cout << "phase complete\n";
}};
void worker() {
for (int phase = 0; phase < 3; ++phase) {
do_phase_work();
sync_point.arrive_and_wait(); // all threads sync here
}
}
8.3 std::counting_semaphore / std::binary_semaphore
A counting semaphore controls access to a resource with limited capacity.
#include <semaphore>
std::counting_semaphore<10> pool{3}; // up to 3 concurrent acquirers
void worker() {
pool.acquire(); // blocks if 3 already acquired
do_limited_resource_work();
pool.release();
}
std::binary_semaphore is std::counting_semaphore<1> — a lock that can be signaled across threads (similar to a Win32 event or POSIX semaphore). Often simpler than a condition_variable: just acquire() to wait, release() to wake one waiter.
9. thread_local Storage
thread_local (C++11) declares a variable with per-thread storage duration. Each thread gets its own independent copy.
thread_local std::string error_message;
thread_local std::mt19937 rng{std::random_device{}()};
void worker() {
rng(); // each thread has its own engine
error_message = "..."; // each thread has its own buffer
}
Common uses:
- Per-thread caches that don't need synchronization.
- Per-thread random number generators (see C++ Random § 6).
- Error context that's thread-scoped (errno-like).
- Logging context (which request am I handling).
thread_local variables are constructed on first use within each thread and destroyed when the thread exits. They also work at namespace scope (one per thread, globally) and as static class members.