C++ Concurrency


  • Description: A note on the C++ concurrency primitives — std::async, futures, condition variables, atomics, and std::call_once
  • My Notion Note ID: K2A-B1-25
  • Created: 2018-11-09
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


  • C++11 introduced the threading library. Notes cover key primitives + patterns.

1. std::async and Launch Policies

  • std::async — runs callable async (or deferred), returns std::future holding result.
  • 3 launch policies:
Policy Behavior
std::launch::async Guaranteed to run in a new thread. The callable starts executing immediately.
std::launch::deferred Lazy evaluation. The callable is not executed until .get() or .wait() is called on the future. Runs in the calling thread.
std::launch::async | std::launch::deferred Default policy. The implementation decides whether to run asynchronously or deferred, based on system load. You have no control over which.
#include <future>
#include <iostream>

int compute(int x) { return x * x; }

int main() {
    // Guaranteed async execution in a separate thread
    auto f1 = std::async(std::launch::async, compute, 42);

    // Deferred: only runs when we call .get()
    auto f2 = std::async(std::launch::deferred, compute, 10);

    // Default: implementation decides
    auto f3 = std::async(compute, 7);

    std::cout << f1.get() << std::endl;  // 1764
    std::cout << f2.get() << std::endl;  // 100 (executes here)
    std::cout << f3.get() << std::endl;  // 49

    return 0;
}
  • Caution: with default policy, task may never run if .get()/.wait() never called. For guaranteed background execution → explicit std::launch::async.

2. Futures and Promises

  • std::future + std::promise = one-shot inter-thread communication channel.
  1. std::promise — writing end. Thread sets value (or exception).
  2. std::future — reading end. Other thread waits + retrieves.
#include <future>
#include <thread>
#include <iostream>

int main() {
    std::promise<int> prom;
    std::future<int> fut = prom.get_future();

    // Producer thread
    std::thread producer([&prom]() {
        // Do some work...
        prom.set_value(42);
    });

    // Consumer: blocks until the value is ready
    std::cout << "Result: " << fut.get() << std::endl;  // 42

    producer.join();
    return 0;
}

2.1 std::shared_future

  • future::get() callable only once. For multi-reader → std::shared_future.
std::promise<int> prom;
std::shared_future<int> sf = prom.get_future().share();

std::thread a([sf] { std::cout << "got " << sf.get(); });
std::thread b([sf] { std::cout << "got " << sf.get(); });

prom.set_value(42);   // both threads wake up; both can call sf.get()
a.join();
b.join();

2.2 std::packaged_task

  • std::packaged_task<Sig> — wraps callable, exposes return as future. For thread pools + deferred execution.
std::packaged_task<int(int, int)> task(
    [](int a, int b) { return a + b; });

std::future<int> fut = task.get_future();

std::thread(std::move(task), 2, 3).detach();   // run on a separate thread
std::cout << fut.get();   // 5
  • Thread pool pattern: queue of packaged_tasks; workers pop + invoke; future holds result for submitter.

3. std::condition_variable

  • Thread waits for condition without busy-waiting.
  • Always used with std::mutex + predicate.
#include <mutex>
#include <condition_variable>
#include <queue>
#include <thread>
#include <iostream>

std::mutex mtx;
std::condition_variable cv;
std::queue<int> data_queue;
bool done = false;

void producer() {
    for (int i = 0; i < 5; ++i) {
        {
            std::lock_guard<std::mutex> lock(mtx);
            data_queue.push(i);
        }
        cv.notify_one();
    }
    {
        std::lock_guard<std::mutex> lock(mtx);
        done = true;
    }
    cv.notify_one();
}

void consumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, [] { return !data_queue.empty() || done; });

        while (!data_queue.empty()) {
            std::cout << "Got: " << data_queue.front() << std::endl;
            data_queue.pop();
        }
        if (done) break;
    }
}

int main() {
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join();
    t2.join();
    return 0;
}

3.1 Spurious Wakeups

  • CV may wake without notification — spurious wakeup, known OS behavior.
  • always use predicate (loop condition) with wait():
// WRONG: may proceed without the condition being true
cv.wait(lock);

// CORRECT: rechecks the condition after every wakeup
cv.wait(lock, [] { return condition_is_met; });
// Equivalent to:
// while (!condition_is_met) { cv.wait(lock); }

4. Read-Write Lock

  • Multiple concurrent readers, exclusive writer.
  • C++17 has std::shared_mutex; manual version below illustrates the concept:
#include <mutex>
#include <condition_variable>

class RWLock {
public:
    RWLock() = default;

    void lockRead() {
        std::unique_lock<std::mutex> lock(mutex_);
        // Wait until no active writer
        read_cv_.wait(lock, [this] { return !writer_active_; });
        ++readers_;
    }

    void unlockRead() {
        std::unique_lock<std::mutex> lock(mutex_);
        --readers_;
        // If this was the last reader, wake one waiting writer (if any)
        if (readers_ == 0) {
            write_cv_.notify_one();
        }
    }

    void lockWrite() {
        std::unique_lock<std::mutex> lock(mutex_);
        // Wait until no active readers AND no active writer
        write_cv_.wait(lock, [this] {
            return readers_ == 0 && !writer_active_;
        });
        writer_active_ = true;
    }

    void unlockWrite() {
        std::unique_lock<std::mutex> lock(mutex_);
        writer_active_ = false;
        // Prefer waking another writer if any are queued; otherwise wake all readers
        write_cv_.notify_one();
        read_cv_.notify_all();
    }

private:
    std::mutex mutex_;
    std::condition_variable read_cv_;
    std::condition_variable write_cv_;
    int  readers_       = 0;
    bool writer_active_ = false;
};
  • Modern (C++17): std::shared_mutex + std::shared_lock (readers) / std::unique_lock (writers):
#include <shared_mutex>

std::shared_mutex rw_mutex;

void reader() {
    std::shared_lock lock(rw_mutex);  // Multiple readers allowed
    // ... read data ...
}

void writer() {
    std::unique_lock lock(rw_mutex);  // Exclusive access
    // ... write data ...
}

5. Atomic Types

  • std::atomic<T> (C++11) — lock-free (or thread-safe) ops without mutex.
  • Essential for low-level concurrent data structures + flags.
#include <atomic>
#include <thread>
#include <iostream>

std::atomic<int> counter{0};

void increment(int n) {
    for (int i = 0; i < n; ++i) {
        ++counter;  // Atomic increment; thread-safe without a mutex
    }
}

int main() {
    std::thread t1(increment, 100000);
    std::thread t2(increment, 100000);
    t1.join();
    t2.join();
    std::cout << counter.load() << std::endl;  // Always 200000
    return 0;
}

5.1 Common Atomic Operations

  1. Load/storeload(), store(v).
  2. Exchangeexchange(v) atomically replaces + returns old value.
  3. Compare-exchange (CAS)compare_exchange_weak/strong(expected, desired). Building block of lock-free algorithms.
  4. Fetch-and-modifyfetch_add, fetch_sub, fetch_and, fetch_or, fetch_xor. Atomic; return old value (before update).
  5. Operator overloads++a, --a, a += n, a -= n, a |= n. Syntactic sugar over fetch_*; return new value.
#include <atomic>

std::atomic<int> n{10};

// Atomic add — fetch_add returns the OLD value
int old_val = n.fetch_add(5);    // n is now 15; old_val is 10
int new_val = ++n;               // n is now 16; new_val is 16
n += 3;                          // n is now 19

// Atomic CAS — the building block of lock-free algorithms
int expected = 19;
bool ok = n.compare_exchange_strong(expected, 100);
// If n == expected (19), n is set to 100 and ok = true.
// Otherwise, expected is updated to n's actual current value and ok = false.

// Atomic flag and bitwise operations
std::atomic<unsigned> flags{0};
flags.fetch_or(0b0001);          // flags = 0b0001
flags.fetch_xor(0b0011);         // flags = 0b0010
  • weak vs strong CAS:
    • weak — may spuriously fail even when n == expected. Faster on weakly-ordered hardware (ARM, POWER). Use in retry loops.
    • strong — never spurious. Use for single-attempt swap.
// Typical CAS retry loop: atomically double n
int expected = n.load();
int desired;
do {
    desired = expected * 2;
} while (!n.compare_exchange_weak(expected, desired));
// On failure, compare_exchange_weak refreshes 'expected' with n's current value.

5.2 Memory Ordering

  • Every atomic op takes optional std::memory_order — controls ordering of surrounding reads/writes w.r.t. other threads.
Order Use case
memory_order_relaxed Atomicity only; no ordering guarantees. Use for counters where ordering doesn't matter.
memory_order_acquire (load), memory_order_release (store) Pair them for producer/consumer handoff. Reads after an acquire see writes that happened before the matching release.
memory_order_acq_rel For read-modify-write operations (fetch_add etc.) that need both acquire and release semantics.
memory_order_seq_cst Default. Sequentially consistent — strongest, simplest to reason about, but most expensive.
std::atomic<bool> ready{false};
int data = 0;

// Producer
data = 42;
ready.store(true, std::memory_order_release);

// Consumer
while (!ready.load(std::memory_order_acquire)) { }
// After the acquire, 'data' is guaranteed to be 42.
assert(data == 42);

6. std::once_flag and std::call_once

  • std::call_once — runs callable exactly once across all calling threads. For lazy init of shared resources.
#include <mutex>
#include <iostream>
#include <thread>

std::once_flag init_flag;
int* shared_resource = nullptr;

void initResource() {
    shared_resource = new int(42);
    std::cout << "Resource initialized" << std::endl;
}

void worker() {
    std::call_once(init_flag, initResource);
    std::cout << "Resource value: " << *shared_resource << std::endl;
}

int main() {
    std::thread t1(worker);
    std::thread t2(worker);
    std::thread t3(worker);
    t1.join();
    t2.join();
    t3.join();
    // "Resource initialized" is printed exactly once
    delete shared_resource;
    return 0;
}

7. std::jthread and Stop Tokens (C++20)

  • std::thread problems: std::terminates if destructed unjoined; no built-in cancellation.
  • std::jthread (C++20) fixes both.
#include <thread>
#include <stop_token>
#include <iostream>

void worker(std::stop_token st) {
    while (!st.stop_requested()) {
        // do work
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    std::cout << "stopping cleanly\n";
}

void example() {
    std::jthread t(worker);   // takes a stop_token automatically
    std::this_thread::sleep_for(std::chrono::seconds(1));
    // jthread's destructor calls request_stop() and join() automatically
}

vs std::thread:

  1. Auto-joining destructor — no std::terminate if you forget join().
  2. Built-in stop_token — cooperative early exit.
  3. Same API otherwise.
  • Worker checks stop_requested() in loop. For interruptible waits → stop_callback to wake condition_variable_any:
std::condition_variable_any cv;
std::stop_callback cb(stop_token, [&] { cv.notify_all(); });
cv.wait(lock, stop_token, [] { return ready; });   // wakes on ready or stop

8. Latches, Barriers, and Semaphores (C++20)

  • 3 high-level sync primitives added in C++20.

8.1 std::latch

  • One-shot counter. count_down() decrements, wait() blocks until zero. Not reusable.
#include <latch>
#include <thread>
#include <vector>

std::latch start{1};   // initial count
std::latch finish{N};  // wait for N workers

std::vector<std::jthread> ts;
for (int i = 0; i < N; ++i) {
    ts.emplace_back([&] {
        start.wait();           // block until main signals start
        do_work();
        finish.count_down();
    });
}

start.count_down();              // release all workers
finish.wait();                   // wait for all to finish

8.2 std::barrier

  • Like latch but reusable. Resets to initial count after all arrive; optional completion fn runs at sync point.
#include <barrier>

std::barrier sync_point{N, [] {
    // runs once when all N arrive, before they're released
    std::cout << "phase complete\n";
}};

void worker() {
    for (int phase = 0; phase < 3; ++phase) {
        do_phase_work();
        sync_point.arrive_and_wait();   // all threads sync here
    }
}

8.3 std::counting_semaphore / std::binary_semaphore

  • Counting semaphore — limit concurrent access to N.
#include <semaphore>

std::counting_semaphore<10> pool{3};  // up to 3 concurrent acquirers

void worker() {
    pool.acquire();          // blocks if 3 already acquired
    do_limited_resource_work();
    pool.release();
}
  • std::binary_semaphore = std::counting_semaphore<1> — cross-thread signal lock (like Win32 event / POSIX sem). Often simpler than condition_variable: acquire() to wait, release() to wake.

9. thread_local Storage

  • thread_local (C++11) — per-thread storage duration. Each thread has its own independent copy.
thread_local std::string error_message;
thread_local std::mt19937 rng{std::random_device{}()};

void worker() {
    rng();                      // each thread has its own engine
    error_message = "...";      // each thread has its own buffer
}

Common uses:

  1. Per-thread caches (no sync needed).
  2. Per-thread RNGs (see C++ Random § 6).
  3. Thread-scoped error context (errno-like).
  4. Logging context (request ID).
  • Constructed on first per-thread use, destroyed when thread exits.
  • Works at namespace scope (1 per thread, globally) and as static class member.