10 Common Performance Bottlenecks in Python and How to Fix Them


Your Python script runs fine on your laptop. But deploy it to a server with real data, and suddenly it crawls. You are not alone. Every intermediate and advanced Python developer hits performance walls. The language is expressive and flexible, but those same qualities hide traps that can cost you seconds or even minutes of runtime. This article walks through ten frequent bottlenecks, explains why they happen, and shows you concrete ways to fix them. No fluff. No theory that only works in a vacuum. Just honest, battle-tested advice.

Key Takeaway

Most Python performance problems come from the same root causes: slow I/O, naive loops, wrong data structures, missing built-ins, and ignoring profiling. Fixing them usually involves replacing a list with a set, using asyncio for I/O, leveraging NumPy or built-in functions, and running a profiler before you guess. Each fix is simple once you know where to look.

Stop blaming Python, start fixing these ten bottlenecks

Python’s speed is often misunderstood. The interpreter is not slow; the patterns you use can make it feel slow. Let’s go through the most common performance traps in 2026.

1. Forgetting that file and network I/O blocks everything

Your code reads a CSV, calls an API, or writes logs synchronously. While that operation runs, your entire program waits. This is the number one bottleneck in real-world Python applications.

The fix: Use asyncio for I/O-bound tasks. Or switch to a thread pool for blocking operations that you cannot make async. For example, instead of reading files one by one in a loop, gather them concurrently.

# Slow: sequential reads
for filename in files:
    data = process(open(filename).read())

# Faster: concurrent reads with asyncio
async def read_file(filename):
    return process(await aiofiles.open(filename).read())

2. Using lists where a set or dict would be ten times faster

Membership testing with in on a list is O(n). On a set, it is O(1). The difference becomes huge when your collection has thousands of items.

The fix: If you only need to check membership or eliminate duplicates, use a set. If you need to look up items by a key, use a dict. This is one of the cheapest optimizations you can make.

Task Slow method Fast method Speed gain
Membership check item in list item in set 100x or more
Counting unique items list + manual count set length memory + speed
Removing duplicates list with loop list(set(original)) O(n) vs O(n^2)

3. Looping in pure Python when NumPy or built-ins can vectorize

A loop over a million numbers doing math is slow because every iteration involves Python overhead. NumPy pushes that loop into C.

The fix: Replace for i in range(len(arr)): arr[i] = func(arr[i]) with arr = func(arr) if func is a NumPy function. If you cannot use NumPy, consider list comprehensions: they run faster than manual for loops because the iteration happens in C.

import time
import numpy as np

# Pure Python loop
data = range(10_000_000)
start = time.time()
squared = [x*x for x in data]  # list comprehension is already faster than for loop
print(f"List comprehension: {time.time() - start:.2f}s")

# NumPy
arr = np.arange(10_000_000)
start = time.time()
squared_np = arr*arr
print(f"NumPy: {time.time() - start:.2f}s")

4. Calling functions repeatedly inside tight loops

Function calls in Python are relatively expensive. If you call the same function millions of times inside a loop, that overhead adds up.

The fix: Hoist the function reference out of the loop. Assign the method or function to a local variable before the loop starts. For example, instead of list.append, assign append = mylist.append and use append(item).

# Slow
for item in huge_list:
    result.append(compute(item))

# Faster
append = result.append
for item in huge_list:
    append(compute(item))

5. Ignoring object creation overhead in hot paths

Creating many small objects (like tuples, dataclasses, or dicts) inside a loop causes memory allocation overhead and eventual garbage collection pauses.

The fix: Reuse objects where possible. Use __slots__ for classes that have many instances. Or, if you only need a lightweight data container, use a named tuple or a simple tuple and avoid the overhead of a full class.

# Instead of creating a new dict each loop iteration
for item in data:
    row = {"id": item.id, "value": item.value}
    process(row)

# Pre-allocate a list of dicts and mutate them, or use a local list
rows = [{"id": 0, "value": 0} for _ in data]
for i, item in enumerate(data):
    rows[i]["id"] = item.id
    rows[i]["value"] = item.value

Expert advice: “The fastest allocation is the one you never make. Profile your object creation hotspots with cProfile and see if you can reuse buffers or use __slots__.” — core Python contributor (paraphrased)

6. Not profiling before optimizing

Guessing where the bottleneck is almost always leads to wasted effort. You might spend hours optimizing a loop that runs 1% of the time.

The fix: Always measure first. Use cProfile for a quick file-level profile, or py-spy for a sampling profiler that works in production. Identify the top 3 functions by cumulative time. Then optimize those.

Here is a practical process to use profiling in your own projects:

  1. Run python -m cProfile -s cumulative your_script.py | head -20
  2. Look at the functions with the highest cumulative time.
  3. Check if those functions are doing I/O, looping, or calling many sub-functions.
  4. Apply a targeted fix (e.g., switch data structure, add caching, or use a built-in).
  5. Rerun the profile and confirm the improvement.

7. Using mutable default arguments by mistake

A classic gotcha: def func(items=[]) creates one list object that persists across calls. If your function modifies it, each call builds on the previous state, and you waste time merging data that should be fresh.

The fix: Use None as the default and create a new list inside the function. This also avoids accidental shared state and reduces debugging time.

# Bad
def process(items=[]):
    items.append("new")
    return items

# Good
def process(items=None):
    if items is None:
        items = []
    items.append("new")
    return items

8. Using recursion when iteration is safer and faster

Python’s recursion limit and overhead make deep recursion impractical. Each recursive call adds a stack frame. Iteration with a loop or a stack object is usually faster and avoids hitting the recursion limit.

The fix: Rewrite recursive functions iteratively. For tree traversal, use an explicit stack. For factorial or fibonacci, use a loop.

# Recursive (slow for large n)
def factorial(n):
    return n * factorial(n-1) if n else 1

# Iterative (fast)
def factorial(n):
    result = 1
    for i in range(1, n+1):
        result *= i
    return result

9. Handling exceptions inside hot loops

try/except is cheap when the exception is rarely raised. But if you catch exceptions in a loop that runs millions of times, the overhead of setting up the try frame adds up. Worse, raising and catching an exception (like a KeyError or IndexError) is expensive.

The fix: Use “look before you leap” patterns. Check with if key in dict before accessing, or use dict.get() with a default. For value conversion, use str.isdigit() before int().

# Slow (relies on exceptions)
def get_value(d, key):
    try:
        return d[key]
    except KeyError:
        return None

# Fast
def get_value(d, key):
    return d.get(key)

10. Compiling code too slowly with Cython or not using C extensions at all

When you have truly CPU-bound loops that NumPy cannot vectorize (e.g., complex string parsing or custom algorithms), pure Python will max out at a fraction of C speed. Yet many developers assume they have to rewrite everything in C.

The fix: Use numba to JIT-compile numerical functions, or write a small C extension with ctypes or Cython for the hot path. If your bottleneck is string processing, consider regex module (faster than re in many cases) or write it in Rust via pyo3.

  • Use numba for math-heavy loops (just add a decorator).
  • Use Cython to compile parts of your module into C.
  • Use pyo3 to call Rust functions for maximum speed.

If you find yourself doing heavy numerical work, check out our guide on how to optimize Python code for high-performance computing for deeper techniques.


A cheat sheet for profiling and fixing common bottlenecks

  • I/O bound → async or thread pool
  • CPU bound (loops) → NumPy, numba, or Cython
  • Membership checks → switch from list to set
  • Too many function calls → hoist references
  • Memory churn → reuse objects, use __slots__
  • Recursion depth → rewrite as iteration
  • Exception overhead → use conditional checks

Start profiling today, not tomorrow

You do not need to memorize all ten bottlenecks. The real skill is learning how to spot them. Run a profiler on your slowest script right now. See which of these patterns appear. Then apply the fix that matches. Your code will run faster, and you will have a repeatable method for future projects. Python is not the problem. It is the patterns you use. Change those, and you change everything.

Leave a Reply

Your email address will not be published. Required fields are marked *