Python Concurrency and asyncio
- Description: The GIL,
threadingfor I/O,multiprocessingfor CPU,concurrent.futuresexecutors, andasyncio(event loop,async/await, tasks,gather), when to choose which - My Notion Note ID: K2A-D1-14
- Created: 2023-09-22
- Updated: 2026-05-11
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
1. The GIL
- CPython's Global Interpreter Lock serializes bytecode, at most one OS thread runs Python at a time
- Threading scales I/O, GIL is released during blocked syscalls
- Threading does NOT scale CPU-bound Python work, use
multiprocessingor a C extension (NumPy releases the GIL during array ops) - PEP 703 (free-threaded build), opt-in since 3.13, still experimental
- Biggest concurrency surprise for systems programmers, C++
std::threadruns concurrently on all cores
2. threading
import threading
def worker(name):
for i in range(5):
print(name, i)
t = threading.Thread(target=worker, args=("A",), daemon=True)
t.start()
t.join()
Synchronization primitives:
lock = threading.Lock()
with lock: # context-manager use is idiomatic
shared_state.append(x)
ev = threading.Event() # signal flag, ev.set()/ev.wait()
sem = threading.Semaphore(3) # bounded resource
cond = threading.Condition() # wait/notify on state changes
threading.local(), per-thread storagequeue.Queue, thread-safe channel (use instead of hand-rolling)- Daemon threads exit when main exits; non-daemons must finish for shutdown
3. multiprocessing
- Spawns child processes, each with its own interpreter and GIL → true parallelism for CPU work
from multiprocessing import Pool
def square(n): return n * n
if __name__ == "__main__": # required on Windows / spawn start
with Pool(processes=4) as pool:
results = pool.map(square, range(10))
- Cost: arguments are pickled to cross the process boundary
- Unpicklable values (open files, lambdas) fail
- For CPU-heavy work on large NumPy arrays, prefer shared memory (
multiprocessing.shared_memory) over pickling
Start methods:
fork(current Linux default, copy-on-write, cheap)spawn(Windows always; macOS default since 3.8, Apple libs unsafe to fork; slower but isolated)forkserver(compromise, one pre-forked helper spawns children)- Python 3.14 changes the Linux default away from
forkbecause forking multi-threaded processes is unsafe
4. concurrent.futures
- Unified high-level API over both threads and processes, the natural choice for fan-out work
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=8) as pool:
futures = [pool.submit(fetch, url) for url in urls]
for f in as_completed(futures):
try:
print(f.result())
except Exception as e:
print(f"failed: {e}")
with ProcessPoolExecutor() as pool:
for result in pool.map(heavy_compute, inputs):
...
pool.mappreserves orderas_completedyields results in finish order- Each
Futurecarries either a result or the exception raised by the worker
5. asyncio
- Cooperative concurrency on a single thread with an event loop
- Tasks yield with
await; when one waits on I/O, the loop runs another - No GIL contention, there's exactly one OS thread
5.1 async / await
import asyncio
async def fetch(url):
print(f"start {url}")
await asyncio.sleep(1) # pretend network
print(f"done {url}")
return f"result of {url}"
async def main():
result = await fetch("a")
print(result)
asyncio.run(main())
async def→ coroutine function; calling it returns a coroutine object (does nothing until awaited or scheduled)await expr, suspend untilexpr(another coroutine / awaitable) completesasyncio.run(coro), entry point; creates loop, runs to completion, closes
5.2 Tasks and gather
- A task schedules a coroutine on the loop concurrently with the caller
async def main():
# Run three fetches in parallel
results = await asyncio.gather(
fetch("a"), fetch("b"), fetch("c"),
)
print(results)
# Or schedule individually:
async def main():
t1 = asyncio.create_task(fetch("a"))
t2 = asyncio.create_task(fetch("b"))
print(await t1, await t2)
Coordination primitives:
await asyncio.wait_for(fetch(url), timeout=5)
await asyncio.shield(critical()) # protect from cancel
async with asyncio.TaskGroup() as tg: # 3.11+, preferred over gather
tg.create_task(work1())
tg.create_task(work2())
# TaskGroup re-raises errors via an ExceptionGroup
5.3 Sync ↔ Async Bridges
# From async code, run blocking code in a thread:
await asyncio.to_thread(blocking_fn, *args)
# Or use a custom executor:
loop = asyncio.get_running_loop()
await loop.run_in_executor(pool, blocking_fn, *args)
- Calling blocking code (CPU work,
time.sleep,requests.get) directly fromasync deffreezes the entire event loop, common bug
6. Choosing a Model
| Workload | Best fit |
|---|---|
| Many parallel network calls | asyncio (with aiohttp/httpx/asyncpg) or ThreadPoolExecutor |
| Many parallel file/database operations on blocking libs | ThreadPoolExecutor |
| CPU-heavy work in pure Python | ProcessPoolExecutor / multiprocessing |
| CPU-heavy work in NumPy / native code | ThreadPoolExecutor (native libs release the GIL) |
| Single long blocking call inside async code | asyncio.to_thread(...) |
| One-off shell command | subprocess.run(...) |
- The asyncio ecosystem matters more than the language feature, pick async-aware libraries (
aiohttp,httpx[async],asyncpg,aiofiles) or you'll silently block the loop