C++ Concurrency
- Description: A note on the C++ concurrency primitives —
std::async, futures, condition variables, atomics, andstd::call_once - My Notion Note ID: K2A-B1-25
- Created: 2018-11-09
- Updated: 2026-02-28
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1.
std::asyncand Launch Policies - 2. Futures and Promises
- 3.
std::condition_variable - 4. Read-Write Lock
- 5. Atomic Types
- 6.
std::once_flagandstd::call_once - 7.
std::jthreadand Stop Tokens (C++20) - 8. Latches, Barriers, and Semaphores (C++20)
- 9.
thread_localStorage
- C++11 introduced the threading library. Notes cover key primitives + patterns.
1. std::async and Launch Policies
std::async— runs callable async (or deferred), returnsstd::futureholding result.- 3 launch policies:
| Policy | Behavior |
|---|---|
std::launch::async |
Guaranteed to run in a new thread. The callable starts executing immediately. |
std::launch::deferred |
Lazy evaluation. The callable is not executed until .get() or .wait() is called on the future. Runs in the calling thread. |
std::launch::async | std::launch::deferred |
Default policy. The implementation decides whether to run asynchronously or deferred, based on system load. You have no control over which. |
#include <future>
#include <iostream>
int compute(int x) { return x * x; }
int main() {
// Guaranteed async execution in a separate thread
auto f1 = std::async(std::launch::async, compute, 42);
// Deferred: only runs when we call .get()
auto f2 = std::async(std::launch::deferred, compute, 10);
// Default: implementation decides
auto f3 = std::async(compute, 7);
std::cout << f1.get() << std::endl; // 1764
std::cout << f2.get() << std::endl; // 100 (executes here)
std::cout << f3.get() << std::endl; // 49
return 0;
}
- Caution: with default policy, task may never run if
.get()/.wait()never called. For guaranteed background execution → explicitstd::launch::async.
2. Futures and Promises
std::future+std::promise= one-shot inter-thread communication channel.
std::promise— writing end. Thread sets value (or exception).std::future— reading end. Other thread waits + retrieves.
#include <future>
#include <thread>
#include <iostream>
int main() {
std::promise<int> prom;
std::future<int> fut = prom.get_future();
// Producer thread
std::thread producer([&prom]() {
// Do some work...
prom.set_value(42);
});
// Consumer: blocks until the value is ready
std::cout << "Result: " << fut.get() << std::endl; // 42
producer.join();
return 0;
}
2.1 std::shared_future
future::get()callable only once. For multi-reader →std::shared_future.
std::promise<int> prom;
std::shared_future<int> sf = prom.get_future().share();
std::thread a([sf] { std::cout << "got " << sf.get(); });
std::thread b([sf] { std::cout << "got " << sf.get(); });
prom.set_value(42); // both threads wake up; both can call sf.get()
a.join();
b.join();
2.2 std::packaged_task
std::packaged_task<Sig>— wraps callable, exposes return as future. For thread pools + deferred execution.
std::packaged_task<int(int, int)> task(
[](int a, int b) { return a + b; });
std::future<int> fut = task.get_future();
std::thread(std::move(task), 2, 3).detach(); // run on a separate thread
std::cout << fut.get(); // 5
- Thread pool pattern: queue of
packaged_tasks; workers pop + invoke; future holds result for submitter.
3. std::condition_variable
- Thread waits for condition without busy-waiting.
- Always used with
std::mutex+ predicate.
#include <mutex>
#include <condition_variable>
#include <queue>
#include <thread>
#include <iostream>
std::mutex mtx;
std::condition_variable cv;
std::queue<int> data_queue;
bool done = false;
void producer() {
for (int i = 0; i < 5; ++i) {
{
std::lock_guard<std::mutex> lock(mtx);
data_queue.push(i);
}
cv.notify_one();
}
{
std::lock_guard<std::mutex> lock(mtx);
done = true;
}
cv.notify_one();
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [] { return !data_queue.empty() || done; });
while (!data_queue.empty()) {
std::cout << "Got: " << data_queue.front() << std::endl;
data_queue.pop();
}
if (done) break;
}
}
int main() {
std::thread t1(producer);
std::thread t2(consumer);
t1.join();
t2.join();
return 0;
}
3.1 Spurious Wakeups
- CV may wake without notification — spurious wakeup, known OS behavior.
- → always use predicate (loop condition) with
wait():
// WRONG: may proceed without the condition being true
cv.wait(lock);
// CORRECT: rechecks the condition after every wakeup
cv.wait(lock, [] { return condition_is_met; });
// Equivalent to:
// while (!condition_is_met) { cv.wait(lock); }
4. Read-Write Lock
- Multiple concurrent readers, exclusive writer.
- C++17 has
std::shared_mutex; manual version below illustrates the concept:
#include <mutex>
#include <condition_variable>
class RWLock {
public:
RWLock() = default;
void lockRead() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until no active writer
read_cv_.wait(lock, [this] { return !writer_active_; });
++readers_;
}
void unlockRead() {
std::unique_lock<std::mutex> lock(mutex_);
--readers_;
// If this was the last reader, wake one waiting writer (if any)
if (readers_ == 0) {
write_cv_.notify_one();
}
}
void lockWrite() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until no active readers AND no active writer
write_cv_.wait(lock, [this] {
return readers_ == 0 && !writer_active_;
});
writer_active_ = true;
}
void unlockWrite() {
std::unique_lock<std::mutex> lock(mutex_);
writer_active_ = false;
// Prefer waking another writer if any are queued; otherwise wake all readers
write_cv_.notify_one();
read_cv_.notify_all();
}
private:
std::mutex mutex_;
std::condition_variable read_cv_;
std::condition_variable write_cv_;
int readers_ = 0;
bool writer_active_ = false;
};
- Modern (C++17):
std::shared_mutex+std::shared_lock(readers) /std::unique_lock(writers):
#include <shared_mutex>
std::shared_mutex rw_mutex;
void reader() {
std::shared_lock lock(rw_mutex); // Multiple readers allowed
// ... read data ...
}
void writer() {
std::unique_lock lock(rw_mutex); // Exclusive access
// ... write data ...
}
5. Atomic Types
std::atomic<T>(C++11) — lock-free (or thread-safe) ops without mutex.- Essential for low-level concurrent data structures + flags.
#include <atomic>
#include <thread>
#include <iostream>
std::atomic<int> counter{0};
void increment(int n) {
for (int i = 0; i < n; ++i) {
++counter; // Atomic increment; thread-safe without a mutex
}
}
int main() {
std::thread t1(increment, 100000);
std::thread t2(increment, 100000);
t1.join();
t2.join();
std::cout << counter.load() << std::endl; // Always 200000
return 0;
}
5.1 Common Atomic Operations
- Load/store —
load(),store(v). - Exchange —
exchange(v)atomically replaces + returns old value. - Compare-exchange (CAS) —
compare_exchange_weak/strong(expected, desired). Building block of lock-free algorithms. - Fetch-and-modify —
fetch_add,fetch_sub,fetch_and,fetch_or,fetch_xor. Atomic; return old value (before update). - Operator overloads —
++a,--a,a += n,a -= n,a |= n. Syntactic sugar overfetch_*; return new value.
#include <atomic>
std::atomic<int> n{10};
// Atomic add — fetch_add returns the OLD value
int old_val = n.fetch_add(5); // n is now 15; old_val is 10
int new_val = ++n; // n is now 16; new_val is 16
n += 3; // n is now 19
// Atomic CAS — the building block of lock-free algorithms
int expected = 19;
bool ok = n.compare_exchange_strong(expected, 100);
// If n == expected (19), n is set to 100 and ok = true.
// Otherwise, expected is updated to n's actual current value and ok = false.
// Atomic flag and bitwise operations
std::atomic<unsigned> flags{0};
flags.fetch_or(0b0001); // flags = 0b0001
flags.fetch_xor(0b0011); // flags = 0b0010
weakvsstrongCAS:weak— may spuriously fail even whenn == expected. Faster on weakly-ordered hardware (ARM, POWER). Use in retry loops.strong— never spurious. Use for single-attempt swap.
// Typical CAS retry loop: atomically double n
int expected = n.load();
int desired;
do {
desired = expected * 2;
} while (!n.compare_exchange_weak(expected, desired));
// On failure, compare_exchange_weak refreshes 'expected' with n's current value.
5.2 Memory Ordering
- Every atomic op takes optional
std::memory_order— controls ordering of surrounding reads/writes w.r.t. other threads.
| Order | Use case |
|---|---|
memory_order_relaxed |
Atomicity only; no ordering guarantees. Use for counters where ordering doesn't matter. |
memory_order_acquire (load), memory_order_release (store) |
Pair them for producer/consumer handoff. Reads after an acquire see writes that happened before the matching release. |
memory_order_acq_rel |
For read-modify-write operations (fetch_add etc.) that need both acquire and release semantics. |
memory_order_seq_cst |
Default. Sequentially consistent — strongest, simplest to reason about, but most expensive. |
std::atomic<bool> ready{false};
int data = 0;
// Producer
data = 42;
ready.store(true, std::memory_order_release);
// Consumer
while (!ready.load(std::memory_order_acquire)) { }
// After the acquire, 'data' is guaranteed to be 42.
assert(data == 42);
- Default to
seq_cst. Relax only after profiling. Mixing orders incorrectly = easy way to write subtly buggy lock-free code. - See cppreference: std::atomic and memory_order.
6. std::once_flag and std::call_once
std::call_once— runs callable exactly once across all calling threads. For lazy init of shared resources.
#include <mutex>
#include <iostream>
#include <thread>
std::once_flag init_flag;
int* shared_resource = nullptr;
void initResource() {
shared_resource = new int(42);
std::cout << "Resource initialized" << std::endl;
}
void worker() {
std::call_once(init_flag, initResource);
std::cout << "Resource value: " << *shared_resource << std::endl;
}
int main() {
std::thread t1(worker);
std::thread t2(worker);
std::thread t3(worker);
t1.join();
t2.join();
t3.join();
// "Resource initialized" is printed exactly once
delete shared_resource;
return 0;
}
7. std::jthread and Stop Tokens (C++20)
std::threadproblems:std::terminates if destructed unjoined; no built-in cancellation.std::jthread(C++20) fixes both.
#include <thread>
#include <stop_token>
#include <iostream>
void worker(std::stop_token st) {
while (!st.stop_requested()) {
// do work
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout << "stopping cleanly\n";
}
void example() {
std::jthread t(worker); // takes a stop_token automatically
std::this_thread::sleep_for(std::chrono::seconds(1));
// jthread's destructor calls request_stop() and join() automatically
}
vs std::thread:
- Auto-joining destructor — no
std::terminateif you forgetjoin(). - Built-in
stop_token— cooperative early exit. - Same API otherwise.
- Worker checks
stop_requested()in loop. For interruptible waits →stop_callbackto wakecondition_variable_any:
std::condition_variable_any cv;
std::stop_callback cb(stop_token, [&] { cv.notify_all(); });
cv.wait(lock, stop_token, [] { return ready; }); // wakes on ready or stop
8. Latches, Barriers, and Semaphores (C++20)
- 3 high-level sync primitives added in C++20.
8.1 std::latch
- One-shot counter.
count_down()decrements,wait()blocks until zero. Not reusable.
#include <latch>
#include <thread>
#include <vector>
std::latch start{1}; // initial count
std::latch finish{N}; // wait for N workers
std::vector<std::jthread> ts;
for (int i = 0; i < N; ++i) {
ts.emplace_back([&] {
start.wait(); // block until main signals start
do_work();
finish.count_down();
});
}
start.count_down(); // release all workers
finish.wait(); // wait for all to finish
8.2 std::barrier
- Like
latchbut reusable. Resets to initial count after all arrive; optional completion fn runs at sync point.
#include <barrier>
std::barrier sync_point{N, [] {
// runs once when all N arrive, before they're released
std::cout << "phase complete\n";
}};
void worker() {
for (int phase = 0; phase < 3; ++phase) {
do_phase_work();
sync_point.arrive_and_wait(); // all threads sync here
}
}
8.3 std::counting_semaphore / std::binary_semaphore
- Counting semaphore — limit concurrent access to N.
#include <semaphore>
std::counting_semaphore<10> pool{3}; // up to 3 concurrent acquirers
void worker() {
pool.acquire(); // blocks if 3 already acquired
do_limited_resource_work();
pool.release();
}
std::binary_semaphore=std::counting_semaphore<1>— cross-thread signal lock (like Win32 event / POSIX sem). Often simpler thancondition_variable:acquire()to wait,release()to wake.
9. thread_local Storage
thread_local(C++11) — per-thread storage duration. Each thread has its own independent copy.
thread_local std::string error_message;
thread_local std::mt19937 rng{std::random_device{}()};
void worker() {
rng(); // each thread has its own engine
error_message = "..."; // each thread has its own buffer
}
Common uses:
- Per-thread caches (no sync needed).
- Per-thread RNGs (see C++ Random § 6).
- Thread-scoped error context (errno-like).
- Logging context (request ID).
- Constructed on first per-thread use, destroyed when thread exits.
- Works at namespace scope (1 per thread, globally) and as static class member.