Python Concurrency and asyncio


  • Description: The GIL, threading for I/O, multiprocessing for CPU, concurrent.futures executors, and asyncio (event loop, async/await, tasks, gather), when to choose which
  • My Notion Note ID: K2A-D1-14
  • Created: 2023-09-22
  • Updated: 2026-05-11
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. The GIL

  • CPython's Global Interpreter Lock serializes bytecode, at most one OS thread runs Python at a time
  • Threading scales I/O, GIL is released during blocked syscalls
  • Threading does NOT scale CPU-bound Python work, use multiprocessing or a C extension (NumPy releases the GIL during array ops)
  • PEP 703 (free-threaded build), opt-in since 3.13, still experimental
  • Biggest concurrency surprise for systems programmers, C++ std::thread runs concurrently on all cores

2. threading

import threading

def worker(name):
    for i in range(5):
        print(name, i)

t = threading.Thread(target=worker, args=("A",), daemon=True)
t.start()
t.join()

Synchronization primitives:

lock = threading.Lock()
with lock:                       # context-manager use is idiomatic
    shared_state.append(x)

ev   = threading.Event()         # signal flag, ev.set()/ev.wait()
sem  = threading.Semaphore(3)    # bounded resource
cond = threading.Condition()     # wait/notify on state changes
  • threading.local(), per-thread storage
  • queue.Queue, thread-safe channel (use instead of hand-rolling)
  • Daemon threads exit when main exits; non-daemons must finish for shutdown

3. multiprocessing

  • Spawns child processes, each with its own interpreter and GIL → true parallelism for CPU work
from multiprocessing import Pool

def square(n): return n * n

if __name__ == "__main__":              # required on Windows / spawn start
    with Pool(processes=4) as pool:
        results = pool.map(square, range(10))
  • Cost: arguments are pickled to cross the process boundary
  • Unpicklable values (open files, lambdas) fail
  • For CPU-heavy work on large NumPy arrays, prefer shared memory (multiprocessing.shared_memory) over pickling

Start methods:

  • fork (current Linux default, copy-on-write, cheap)
  • spawn (Windows always; macOS default since 3.8, Apple libs unsafe to fork; slower but isolated)
  • forkserver (compromise, one pre-forked helper spawns children)
  • Python 3.14 changes the Linux default away from fork because forking multi-threaded processes is unsafe

4. concurrent.futures

  • Unified high-level API over both threads and processes, the natural choice for fan-out work
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed

with ThreadPoolExecutor(max_workers=8) as pool:
    futures = [pool.submit(fetch, url) for url in urls]
    for f in as_completed(futures):
        try:
            print(f.result())
        except Exception as e:
            print(f"failed: {e}")

with ProcessPoolExecutor() as pool:
    for result in pool.map(heavy_compute, inputs):
        ...
  • pool.map preserves order
  • as_completed yields results in finish order
  • Each Future carries either a result or the exception raised by the worker

5. asyncio

  • Cooperative concurrency on a single thread with an event loop
  • Tasks yield with await; when one waits on I/O, the loop runs another
  • No GIL contention, there's exactly one OS thread

5.1 async / await

import asyncio

async def fetch(url):
    print(f"start {url}")
    await asyncio.sleep(1)        # pretend network
    print(f"done {url}")
    return f"result of {url}"

async def main():
    result = await fetch("a")
    print(result)

asyncio.run(main())
  • async def → coroutine function; calling it returns a coroutine object (does nothing until awaited or scheduled)
  • await expr, suspend until expr (another coroutine / awaitable) completes
  • asyncio.run(coro), entry point; creates loop, runs to completion, closes

5.2 Tasks and gather

  • A task schedules a coroutine on the loop concurrently with the caller
async def main():
    # Run three fetches in parallel
    results = await asyncio.gather(
        fetch("a"), fetch("b"), fetch("c"),
    )
    print(results)

# Or schedule individually:
async def main():
    t1 = asyncio.create_task(fetch("a"))
    t2 = asyncio.create_task(fetch("b"))
    print(await t1, await t2)

Coordination primitives:

await asyncio.wait_for(fetch(url), timeout=5)
await asyncio.shield(critical())              # protect from cancel
async with asyncio.TaskGroup() as tg:         # 3.11+, preferred over gather
    tg.create_task(work1())
    tg.create_task(work2())
# TaskGroup re-raises errors via an ExceptionGroup

5.3 Sync ↔ Async Bridges

# From async code, run blocking code in a thread:
await asyncio.to_thread(blocking_fn, *args)

# Or use a custom executor:
loop = asyncio.get_running_loop()
await loop.run_in_executor(pool, blocking_fn, *args)
  • Calling blocking code (CPU work, time.sleep, requests.get) directly from async def freezes the entire event loop, common bug

6. Choosing a Model

Workload Best fit
Many parallel network calls asyncio (with aiohttp/httpx/asyncpg) or ThreadPoolExecutor
Many parallel file/database operations on blocking libs ThreadPoolExecutor
CPU-heavy work in pure Python ProcessPoolExecutor / multiprocessing
CPU-heavy work in NumPy / native code ThreadPoolExecutor (native libs release the GIL)
Single long blocking call inside async code asyncio.to_thread(...)
One-off shell command subprocess.run(...)
  • The asyncio ecosystem matters more than the language feature, pick async-aware libraries (aiohttp, httpx[async], asyncpg, aiofiles) or you'll silently block the loop