Description: The GIL, threading for I/O, multiprocessing for CPU, concurrent.futures executors, and asyncio (event loop, async/await, tasks, gather), when to choose which
My Notion Note ID: K2A-D1-14
Created: 2023-09-22
Updated: 2026-05-11
License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

1. The GIL

CPython's Global Interpreter Lock serializes bytecode, at most one OS thread runs Python at a time
Threading scales I/O, GIL is released during blocked syscalls
Threading does NOT scale CPU-bound Python work, use multiprocessing or a C extension (NumPy releases the GIL during array ops)
PEP 703 (free-threaded build), opt-in since 3.13, still experimental
Biggest concurrency surprise for systems programmers, C++ std::thread runs concurrently on all cores

2. `threading`

import threading

def worker(name):
    for i in range(5):
        print(name, i)

t = threading.Thread(target=worker, args=("A",), daemon=True)
t.start()
t.join()

Synchronization primitives:

lock = threading.Lock()
with lock:                       # context-manager use is idiomatic
    shared_state.append(x)

ev   = threading.Event()         # signal flag, ev.set()/ev.wait()
sem  = threading.Semaphore(3)    # bounded resource
cond = threading.Condition()     # wait/notify on state changes

threading.local(), per-thread storage
queue.Queue, thread-safe channel (use instead of hand-rolling)
Daemon threads exit when main exits; non-daemons must finish for shutdown

3. `multiprocessing`

Spawns child processes, each with its own interpreter and GIL → true parallelism for CPU work

from multiprocessing import Pool

def square(n): return n * n

if __name__ == "__main__":              # required on Windows / spawn start
    with Pool(processes=4) as pool:
        results = pool.map(square, range(10))

Cost: arguments are pickled to cross the process boundary
Unpicklable values (open files, lambdas) fail
For CPU-heavy work on large NumPy arrays, prefer shared memory (multiprocessing.shared_memory) over pickling

Start methods:

fork (current Linux default, copy-on-write, cheap)
spawn (Windows always; macOS default since 3.8, Apple libs unsafe to fork; slower but isolated)
forkserver (compromise, one pre-forked helper spawns children)
Python 3.14 changes the Linux default away from fork because forking multi-threaded processes is unsafe

4. `concurrent.futures`

Unified high-level API over both threads and processes, the natural choice for fan-out work

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed

with ThreadPoolExecutor(max_workers=8) as pool:
    futures = [pool.submit(fetch, url) for url in urls]
    for f in as_completed(futures):
        try:
            print(f.result())
        except Exception as e:
            print(f"failed: {e}")

with ProcessPoolExecutor() as pool:
    for result in pool.map(heavy_compute, inputs):
        ...

pool.map preserves order
as_completed yields results in finish order
Each Future carries either a result or the exception raised by the worker

5. `asyncio`

Cooperative concurrency on a single thread with an event loop
Tasks yield with await; when one waits on I/O, the loop runs another
No GIL contention, there's exactly one OS thread

5.1 `async` / `await`

import asyncio

async def fetch(url):
    print(f"start {url}")
    await asyncio.sleep(1)        # pretend network
    print(f"done {url}")
    return f"result of {url}"

async def main():
    result = await fetch("a")
    print(result)

asyncio.run(main())

async def → coroutine function; calling it returns a coroutine object (does nothing until awaited or scheduled)
await expr, suspend until expr (another coroutine / awaitable) completes
asyncio.run(coro), entry point; creates loop, runs to completion, closes

5.2 Tasks and `gather`

A task schedules a coroutine on the loop concurrently with the caller

async def main():
    # Run three fetches in parallel
    results = await asyncio.gather(
        fetch("a"), fetch("b"), fetch("c"),
    )
    print(results)

# Or schedule individually:
async def main():
    t1 = asyncio.create_task(fetch("a"))
    t2 = asyncio.create_task(fetch("b"))
    print(await t1, await t2)

Coordination primitives:

await asyncio.wait_for(fetch(url), timeout=5)
await asyncio.shield(critical())              # protect from cancel
async with asyncio.TaskGroup() as tg:         # 3.11+, preferred over gather
    tg.create_task(work1())
    tg.create_task(work2())
# TaskGroup re-raises errors via an ExceptionGroup

5.3 Sync ↔ Async Bridges

# From async code, run blocking code in a thread:
await asyncio.to_thread(blocking_fn, *args)

# Or use a custom executor:
loop = asyncio.get_running_loop()
await loop.run_in_executor(pool, blocking_fn, *args)

Calling blocking code (CPU work, time.sleep, requests.get) directly from async def freezes the entire event loop, common bug

6. Choosing a Model

Workload	Best fit
Many parallel network calls	`asyncio` (with `aiohttp`/`httpx`/`asyncpg`) or `ThreadPoolExecutor`
Many parallel file/database operations on blocking libs	`ThreadPoolExecutor`
CPU-heavy work in pure Python	`ProcessPoolExecutor` / `multiprocessing`
CPU-heavy work in NumPy / native code	`ThreadPoolExecutor` (native libs release the GIL)
Single long blocking call inside async code	`asyncio.to_thread(...)`
One-off shell command	`subprocess.run(...)`

The asyncio ecosystem matters more than the language feature, pick async-aware libraries (aiohttp, httpx[async], asyncpg, aiofiles) or you'll silently block the loop

Python Concurrency and asyncio

Table of Contents

1. The GIL

2. `threading`

3. `multiprocessing`

4. `concurrent.futures`

5. `asyncio`

5.1 `async` / `await`

5.2 Tasks and `gather`

5.3 Sync ↔ Async Bridges

6. Choosing a Model

Table of Contents

1. The GIL

2. threading

3. multiprocessing

4. concurrent.futures

5. asyncio

5.1 async / await

5.2 Tasks and gather

5.3 Sync ↔ Async Bridges

6. Choosing a Model

2. `threading`

3. `multiprocessing`

4. `concurrent.futures`

5. `asyncio`

5.1 `async` / `await`

5.2 Tasks and `gather`