Improving your python code with simple tricks
Sometimes our code is running slow. Sometimes it is eating up memory. Maybe it is just not as readable as we would like it to be. In this post, we will see how to use some functions from the default library to improve our code. All the code used in this post is available here. While I only presented a few functions that I use frequently, there are many more that can be used to improve your code. I encourage you to check the official documentation to see what else is available.
Use list comprehension whenever possible
What does it mean?
List comprehension is basically another way to create a list. Suppose we want to create a list from values in a range:
# traditional way
tmp = []
for i in range(10_000_000):
tmp.append(i)
# list comprehension
tmp = [i for i in range(10_000_000)]
Why does it matter?
List comprehension is usually much faster than the traditional loop. Let’s compare them:
tmp = []
for i in range(10_000_000):
tmp.append(i)
1.04 s ± 89.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 293.57 MiB
tmp = [i for i in range(10_000_000)]
731 ms ± 71.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 285.82 MiB
Other examples
Creating a list with an if
condition
tmp = []
for i in range(10_000_000):
if i % 2 == 0:
tmp.append(i)
1.11 s ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: -0.25 MiB
tmp = [i for i in range(10_000_000) if i % 2 == 0]
944 ms ± 6.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 115.54 MiB
Creating a list with an if else
condition
tmp = []
for i in range(10_000_000):
if i % 2 == 0:
tmp.append(i)
else:
tmp.append(i+1)
1.61 s ± 90.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 32.89 MiB
tmp = [i if i % 2 == 0 else i + 1 for i in range(10_000_000)]
1.33 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 273.82 MiB
Fastest way to create a list
When generating a list from a generator (range
in this case), it is even faster to use the list()
constructor.
tmp = list(range(10_000_000))
To validate this, let’s compare the code for 7 runs, 10 loops each:
loop | list comprehension | list constructor | |
---|---|---|---|
mean ± std. dev. per loop | 1.04 s ± 89.8 ms | 731 ms ± 71.6 ms | 301 ms ± 18.4 ms |
memory increment | 293.57 MiB | 285.82 MiB | 75.12 MiB |
Why is the list()
constructor faster? According to this answer in StackOverflow:
The list comprehension executes the loop in Python bytecode, just like a regular for loop. The list() call iterates entirely in C code, which is far faster.
To compare all these solutions, lets check the equivalent bytecodes. For the loop solution:
1 0 BUILD_LIST 0
2 STORE_NAME 0 (tmp)
2 4 LOAD_NAME 1 (range)
6 LOAD_CONST 0 (10000000)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_NAME 2 (i)
16 LOAD_NAME 0 (tmp)
18 LOAD_METHOD 3 (append)
20 LOAD_NAME 2 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
>> 28 LOAD_CONST 1 (None)
30 RETURN_VALUE
For the list comprehension solution:
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f272c8eaf50, file "<stdin>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (range)
8 LOAD_NAME 1 (10_000_000)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 POP_TOP
18 LOAD_CONST 2 (None)
20 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7f272c8eaf50, file "<stdin>", line 1>:
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
And for the list constructor solution:
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (range)
4 LOAD_NAME 2 (10_000_000)
6 CALL_FUNCTION 1
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
We can see that the list()
constructor generates less bytecodes.
Use generators and iterators whenever possible
To create a generator like a list comprehension (called generator expression), just replace the squared brackets [ ] with parenthesis ( ).
Why does it matter?
Generators looks like a normal function, except that it contains yield
expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next()
function. It returns a generator iterator, which temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements).
Let’s do some comparisons:
tmp = sum([i for i in range(10_000_000)])
860 ms ± 30.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
tmp = sum((i for i in range(10_000_000)))
609 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Not that different right? Now, let’s check the memory usage. Let’s focus on increment, since it represents the difference in memory between the beginning and end of this execution.
memory increment: 263.44 MiB
memory increment: 0.00 MiB
What happened? The generator only returns one element at a time, which is given to the sum function. This way, we don’t need to pre-generate the whole list to perform the sum of the elements. In fact, since the sum function gets an iterator as parameter, we could call it like this:
tmp = sum(i for i in range(10_000_000))
593 ms ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
Or even:
tmp = sum(range(10_000_000))
168 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.00 MiB
Which runs way faster.
Avoid generating all the values whenever possible
For the sake of these examples, suppose we have an ordered list of values.
What to do if we want only the values lower than a limit?
The list comprehension way of achieving this is this:
tmp = [i for i in range(10_000_000) if i < 1_000_000]
596 ms ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.02 MiB
Now, with loops:
tmp = []
for i in range(10_000_000):
if i < 1_000_000:
tmp.append(i)
else:
break
116 ms ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
Why is it faster with loops?
Because using list comprehension, the whole list must be generated before selecting the elements. The same is not true for the loop, that only runs through some of the values.
Can we do better?
Yes, with takewhile.
from itertools import takewhile
tmp = list(takewhile(lambda x: x < 1_000_000, range(10_000_000)))
107 ms ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
Note: takewhile
is only faster when you know that the condition will be satisfied “soon enough”.
tmp = []
for i in range(10_000_000):
if i < 9_000_000:
tmp.append(i)
else:
break
1.12 s ± 27.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 3.70 MiB
tmp = [i for i in range(10_000_000) if i < 9_000_000]
1.05 s ± 51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 246.21 MiB
tmp = list(takewhile(lambda x: x < 9_000_000, range(10_000_000)))
1.06 s ± 8.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.00 MiB
In this case it is quicker to generate the whole list, and then filter it. But note, while using list comprehension is quicker, takewhile
takes less memory, since it still doesn’t need to store the whole list, even momentarily.
What if we want only the values higher than a limit?
First, let’s try with loops:
tmp = []
for i in range(10_000_000):
if i > 1_000_000:
tmp.append(i)
1.3 s ± 93.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.15 MiB
Now, with list comprehension:
tmp = [i for i in range(10_000_000) if i > 1_000_000]
978 ms ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 169.84 MiB
In this case, since the loop will run through every element, it is slower than the list comprehension. It takes less memory though, since it doesn’t need to store the whole list in memory.
Can we do better?
Yes, with dropwhile.
from itertools import dropwhile
tmp = list(dropwhile(lambda x: x < 1_000_000, range(10_000_000)))
442 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.05 MiB
Note: dropwhile
also is only faster when you know that the condition will be satisfied “soon enough”.
tmp = []
for i in range(10_000_000):
if i > 9_000_000:
tmp.append(i)
654 ms ± 9.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.00 MiB
tmp = [i for i in range(10_000_000) if i > 9_000_000]
623 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
tmp = list(dropwhile(lambda x: x < 9_000_000, range(10_000_000)))
924 ms ± 104 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
What about when we want to get the first N samples?
With loops:
tmp = []
for n, i in enumerate(range(10_000_000)):
if n < 1_000_000:
tmp.append(i)
else:
break
147 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.00 MiB
Doing it with list comprehension:
tmp = [i for i in range(10_000_000)][:1_000_000]
523 ms ± 10.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
Why is it faster with loops?
Because using list comprehension, the whole list must be generated before doing the slice operation. The same is not true for the loop, that only runs through some of the values.
Can we do better?
Yes, with islice.
from itertools import islice
tmp = list(islice((i for i in range(10_000_000)), 1_000_000))
72.5 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.01 MiB
What about when we want to get the last N samples?
We can achieve the same result with islice
. Let’s get straight to the comparisons:
tmp = []
for n, i in enumerate(range(10_000_000)):
if n > 9_000_000:
tmp.append(i)
1.44 s ± 95.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 0.00 MiB
tmp = [i for i in range(10_000_000)][9_000_000:]
743 ms ± 8.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: 177.31 MiB
tmp = list(islice((i for i in range(10_000_000)), 9_000_000, None))
796 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
memory increment: -0.16 MiB
Note again that, as it happened with dropwhile
when the condition takes longer to be satisfied, while using islice
is slower than doing it with list comprehension, it takes much less memory.
What if we just want to count the number of elements that will be generated?
Suppose we want to know how many elements will be generated from a condition. Usually, we would do it like this:
tmp = [i for i in range(value) if i % 2 == 0]
count = len(tmp)
1.02 s ± 63.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
increment: 0.04 MiB
But the problem is, we are storing a whole list in memory only to get its length.
Can we do better?
Yes, by creating a generator that generates 1
every time the condition is true, and summing it.
count = sum(1 for i in range(value) if i % 2 == 0)
991 ms ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
increment: 0.00 MiB
Other useful itertools functions
We already introduced 3 of the most useful functions: dropwhile
, islice
, and takewhile
. Let’s check other useful functions.
cycle
Repeats indefinitely a given sequence.
from itertools import cycle
tmp = []
for counter, i in enumerate(cycle(range(4))):
if counter == 10:
break
tmp.append(i)
print(tmp)
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1]
repeat
Repeats indefinitely a given value, unless the times
argument is specified.
from itertools import repeat
tmp = list(repeat(10, 5))
print(tmp)
[10, 10, 10, 10, 10]
product
Equivalent to a nested for-loop.
tmp = []
for i in range(2):
for j in range(2):
for k in range(2):
for l in range(2):
tmp.append(sum([i, j, k, l]))
print(tmp)
[0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]
from itertools import product
tmp = [sum(i) for i in product(range(2), range(2), range(2), range(2))]
print(tmp)
[0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]
Since in this case we are using the same sequence for all the loops, we can use repeat
to simplify the code:
from itertools import product
tmp = [sum(i) for i in product(range(2), repeat=4)]
print(tmp)
[0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]
itertools has all the combinatorics functions implemented
product | permutation | combination w/ replacement | combination |
---|---|---|---|
AA | AA | ||
AB | AB | AB | AB |
AC | AC | AC | AC |
AD | AD | AD | AD |
BA | BA | ||
BB | BB | ||
BC | BC | BC | BC |
BD | BD | BD | BD |
CA | CA | ||
CB | CB | ||
CC | CC | ||
CD | CD | CD | CD |
DA | DA | ||
DB | DB | ||
DC | DC | ||
DD | DD |
Improving code with functools
Storing function calls with lru_cache
Let’s take the fibonacci function, for example, that calls itself recursively.
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
Let’s use it in a list comprehension to get the first 16 fibonacci numbers:
tmp = [fib(i) for i in range(16)]
698 µs ± 152 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
It takes a pretty good amount of time to execute for a small number. But what if we could automatically save the results of the previous calls to the function? That is what lru_cache
is all about. It stores previous calls, with its given parameters and calculated output, as a least recent used (LRU) cache. This way, whenever we call the function and this call was already made (and its results are still stored in the cache), we simply get the results from the cache.
from functools import lru_cache
@lru_cache
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
Let’s try that line again:
tmp = [fib(i) for i in range(16)]
3.34 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now, let’s check the cache information:
- hits: number of times the function was called and the results were already there;
- misses: number of times the function was called and the results were not there;
- maxsize: current maximum allowed size of the cache;
- currsize: actual size of the cache (stored results).
print(fib.cache_info())
CacheInfo(hits=1132, misses=16, maxsize=128, currsize=16)
Creating functions with defaults from partial
Suppose we have a function, called divide_by
. It is a pretty generic function, but it is usually called with some specific values, like dividing by two, or by three.
def divide_by(x, y):
return x / y
print(divide_by(12, 2))
print(divide_by(12, 3))
6.0
4.0
What if, instead of creating an entire new function, we could only create different signatures for the function, one for each common y value? That is what partial
is for:
from functools import partial
divide_by_two = partial(divide_by, y=2)
divide_by_three = partial(divide_by, y=3)
print(divide_by_two(12))
print(divide_by_three(12))
6.0
4.0
Enjoy Reading This Article?
Here are some more articles you might like to read next: