Concurrency in Python - Threads, Processes, and AsyncIO Explained

Posted on Wed 25 March 2026 by Sanyam Khurana in Python

Python's concurrency story is... interesting. If you've ever Googled "Python multithreading", you've probably run into the GIL debate and come away more confused than when you started. Can Python do concurrency? Yes. Is it straightforward? Not exactly. Python gives you three different concurrency models, each suited for different problems. Pick the wrong one and you'll get no speedup at all. Pick the right one and your code can be dramatically faster.

Let's cut through the confusion and understand what actually works, when, and why.

The GIL - The Elephant in the Room

Before anything else, we need to talk about the Global Interpreter Lock (GIL). CPython (the standard Python interpreter) has a mutex that protects access to Python objects. Only one thread can execute Python bytecode at a time, even on a multi-core machine.

This sounds terrible, and for CPU-bound work, it kind of is. But here's the thing most people miss: the GIL is released during I/O operations. When your thread is waiting for a network response, reading a file, or waiting on a database query, other threads can run just fine.

So the rule is simple:

I/O-bound work: threads work great (GIL doesn't matter)
CPU-bound work: threads won't help, use processes instead

With that out of the way, let's look at the tools.

Threading - concurrent.futures.ThreadPoolExecutor

The concurrent.futures module is the modern way to do threading in Python. Forget about threading.Thread directly - ThreadPoolExecutor handles the pool management, lifecycle, and result collection for you.

Basic Usage

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_url(url):
    response = requests.get(url)
    return f"{url}: {response.status_code}"

urls = [
    "https://httpbin.org/get",
    "https://httpbin.org/ip",
    "https://httpbin.org/user-agent",
    "https://httpbin.org/headers",
    "https://httpbin.org/delay/1",
]

with ThreadPoolExecutor(max_workers=5) as executor:
    results = executor.map(fetch_url, urls)

for result in results:
    print(result)

executor.map works like the built-in map, but runs the function in parallel across threads. Results come back in the same order as the inputs, regardless of which one finishes first.

submit() for More Control

map is convenient but limited. For more control, use submit():

from concurrent.futures import ThreadPoolExecutor, as_completed

def fetch_url(url):
    response = requests.get(url, timeout=10)
    return url, response.status_code

urls = ["https://httpbin.org/get", "https://httpbin.org/delay/2", "https://httpbin.org/ip"]

with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit all tasks
    future_to_url = {executor.submit(fetch_url, url): url for url in urls}

    # Process results as they complete (not in order)
    for future in as_completed(future_to_url):
        url = future_to_url[future]
        try:
            result_url, status = future.result()
            print(f"{result_url}: {status}")
        except Exception as e:
            print(f"{url} generated an exception: {e}")

as_completed yields futures in the order they finish, not the order they were submitted. This is perfect when you want to process results as soon as they're available, and you want to handle exceptions per-task.

Handling Timeouts

You can set timeouts on the result:

future = executor.submit(slow_function, arg)
try:
    result = future.result(timeout=5)  # Wait max 5 seconds
except TimeoutError:
    print("Task took too long")
    future.cancel()  # Try to cancel it

Note that cancel() only works if the task hasn't started yet. Once a thread is running, Python can't forcefully stop it. If you need cancellable tasks, you'll need to use a flag or event that the worker checks periodically.

When to Use ThreadPoolExecutor

Making multiple HTTP requests in parallel
Reading/writing multiple files simultaneously
Database queries that are independent of each other
Any I/O-bound work where you're waiting on external systems

How Many Workers?

For I/O-bound tasks, you can have more workers than CPU cores. A common rule of thumb:

# I/O-bound: more threads are fine
executor = ThreadPoolExecutor(max_workers=20)

# Default (Python 3.8+): min(32, os.cpu_count() + 4)
executor = ThreadPoolExecutor()

ProcessPoolExecutor - True Parallelism

When your work is CPU-bound (crunching numbers, image processing, data transformation), threads won't help because of the GIL. You need separate processes, each with its own Python interpreter and its own GIL.

from concurrent.futures import ProcessPoolExecutor
import math

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

numbers = [112272535095293, 112582705942171, 112272535095293,
           115280095190773, 115797848077099, 1099726899285419]

with ProcessPoolExecutor() as executor:
    results = executor.map(is_prime, numbers)

for number, prime in zip(numbers, results):
    print(f"{number}: {'prime' if prime else 'not prime'}")

The API is almost identical to ThreadPoolExecutor. Just swap the class name. But under the hood, it's completely different - each worker is a separate OS process.

The Catch with Processes

There are some gotchas:

1. Serialization overhead: Arguments and return values are pickled (serialized) and sent between processes. Large objects will be slow. Keep the data you pass small.

# Bad: passing a huge DataFrame to each worker
with ProcessPoolExecutor() as executor:
    results = executor.map(process_chunk, [huge_dataframe] * 10)

# Good: pass file paths or indices, let workers load their own data
with ProcessPoolExecutor() as executor:
    results = executor.map(process_file, file_paths)

2. Not everything is picklable: Lambdas, nested functions, and certain objects can't be pickled. Your worker function must be defined at the module level.

# Bad: lambda can't be pickled
executor.submit(lambda x: x * 2, 5)

# Good: use a regular function
def double(x):
    return x * 2
executor.submit(double, 5)

3. Startup cost: Creating processes is expensive compared to threads. Don't use ProcessPoolExecutor for tiny tasks - the overhead will outweigh the parallelism benefit.

How Many Workers?

For CPU-bound work, more workers than CPU cores doesn't help:

# CPU-bound: match core count
executor = ProcessPoolExecutor(max_workers=os.cpu_count())

# Default: os.cpu_count()
executor = ProcessPoolExecutor()

AsyncIO - The Third Way

AsyncIO is Python's answer to the callback-heavy async models in other languages. It's a single-threaded concurrency model based on coroutines. Instead of threads or processes, you have an event loop that switches between tasks when they're waiting for I/O.

import asyncio
import aiohttp

async def fetch_url(session, url):
    async with session.get(url) as response:
        return url, response.status

async def main():
    urls = [
        "https://httpbin.org/get",
        "https://httpbin.org/ip",
        "https://httpbin.org/user-agent",
        "https://httpbin.org/headers",
    ]

    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

    for url, status in results:
        print(f"{url}: {status}")

asyncio.run(main())

How It Works

When a coroutine hits an await, it yields control back to the event loop. The event loop then runs another coroutine that's ready. When the awaited operation completes, the event loop resumes the original coroutine. All of this happens on a single thread.

async def slow_operation():
    print("Starting slow operation")
    await asyncio.sleep(2)  # Yields control to event loop
    print("Slow operation done")
    return "result"

async def fast_operation():
    print("Starting fast operation")
    await asyncio.sleep(0.5)
    print("Fast operation done")
    return "fast result"

async def main():
    # Both run concurrently on a single thread
    results = await asyncio.gather(slow_operation(), fast_operation())
    print(results)

asyncio.run(main())

Output:

Starting slow operation
Starting fast operation
Fast operation done
Slow operation done
['result', 'fast result']

Both operations ran concurrently - the total time is about 2 seconds, not 2.5.

asyncio.gather vs asyncio.TaskGroup

asyncio.gather is the classic way, but Python 3.11 introduced TaskGroup which handles errors better:

# asyncio.gather - if one task fails, others keep running
results = await asyncio.gather(task1(), task2(), task3(), return_exceptions=True)

# TaskGroup (Python 3.11+) - if one task fails, all are cancelled
async with asyncio.TaskGroup() as tg:
    t1 = tg.create_task(task1())
    t2 = tg.create_task(task2())
    t3 = tg.create_task(task3())
# If any task raises, all others are cancelled and
# an ExceptionGroup is raised

TaskGroup is the recommended approach for new code. It's similar to Go's errgroup - cancel everything when any task fails.

Semaphore for Rate Limiting

Just like the channel-based semaphore pattern in Go, asyncio has Semaphore:

async def fetch_with_limit(sem, session, url):
    async with sem:  # Only N concurrent requests
        async with session.get(url) as response:
            return await response.text()

async def main():
    sem = asyncio.Semaphore(10)  # Max 10 concurrent
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_with_limit(sem, session, url) for url in urls]
        results = await asyncio.gather(*tasks)

When to Use AsyncIO

High-concurrency I/O scenarios (thousands of concurrent connections)
Web servers and API clients
WebSocket handling
When you need to handle many connections with minimal resource usage

The downside: asyncio is viral. Once you use async def, every function that calls it must also be async. And you need async-compatible libraries (aiohttp instead of requests, asyncpg instead of psycopg2).

Choosing the Right Tool

Here's the decision tree I use:

Is your task I/O-bound or CPU-bound?
|
+-- I/O-bound
|   |
|   +-- How many concurrent operations?
|       |
|       +-- Dozens: ThreadPoolExecutor
|       +-- Hundreds/Thousands: AsyncIO
|
+-- CPU-bound
    |
    +-- ProcessPoolExecutor

And here's the comparison table:

	ThreadPoolExecutor	ProcessPoolExecutor	AsyncIO
Best for	I/O-bound (network, files)	CPU-bound (math, processing)	High-concurrency I/O
GIL impact	Released during I/O	Each process has its own GIL	Single thread, no GIL issue
Memory	Shared memory, lightweight	Separate memory per process	Very lightweight
Data passing	Direct (shared memory)	Pickle serialization	Direct (single thread)
Max concurrency	~20-50 threads typical	Number of CPU cores	Thousands of tasks
Ecosystem	Works with everything	Picklable functions only	Needs async libraries

A Real-World Example: Parallel API Calls with Retry

Let me show you a practical example that combines several of these concepts:

from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time

def create_session():
    session = requests.Session()
    retry = Retry(total=3, backoff_factor=0.5)
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

def fetch_user(user_id):
    session = create_session()
    response = session.get(
        f"https://jsonplaceholder.typicode.com/users/{user_id}",
        timeout=10,
    )
    response.raise_for_status()
    return response.json()

def fetch_all_users(user_ids, max_workers=10):
    results = {}
    errors = {}

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_id = {
            executor.submit(fetch_user, uid): uid
            for uid in user_ids
        }

        for future in as_completed(future_to_id):
            uid = future_to_id[future]
            try:
                results[uid] = future.result()
            except Exception as e:
                errors[uid] = str(e)

    return results, errors

if __name__ == "__main__":
    user_ids = range(1, 11)
    start = time.time()
    results, errors = fetch_all_users(user_ids)
    elapsed = time.time() - start

    print(f"Fetched {len(results)} users in {elapsed:.2f}s")
    for uid, user in sorted(results.items()):
        print(f"  {user['name']} ({user['email']})")
    if errors:
        print(f"Errors: {errors}")

Without threading, 10 sequential API calls might take 5+ seconds. With a ThreadPoolExecutor, they all run concurrently and finish in under a second. That's the kind of practical speedup that makes concurrency worth learning.

Summary

Python gives you three concurrency models, each for a different job:

ThreadPoolExecutor: your go-to for I/O-bound parallelism. Simple API, works with any library, no serialization overhead.
ProcessPoolExecutor: true CPU parallelism that sidesteps the GIL. Same API as threads, but watch out for pickling and memory costs.
AsyncIO: for when you need massive I/O concurrency (thousands of connections). Powerful but requires an async-compatible ecosystem.

The concurrent.futures module is where most people should start. Its ThreadPoolExecutor and ProcessPoolExecutor cover 90% of real-world concurrency needs with a clean, consistent API. AsyncIO is the right choice when you're building high-concurrency network services and concurrent.futures isn't cutting it.

And remember - the GIL isn't a showstopper. It just means you need to pick the right tool for the job. Threads for I/O, processes for CPU, and asyncio for massive scale.

If you've any questions about concurrency in Python, please let us know in the comments section below.