List comprehensions and generator expressions
Python offers concise constructs called list comprehensions and generator expressions to create sequences or iterate in a single line of code. These are advanced applications of loops and conditionals that allow for clearer, more compact code in many cases. They essentially compress a loop and an optional condition into a single expression.
List Comprehensions
A list comprehension is a way to build a new list by iterating over a sequence and optionally filtering items, all in one expression. It’s often described as “syntactic sugar” for a for loop that appends to a list. The basic syntax is:
[<expression> for <item> in <iterable> if <condition_optional>]
The result is a new list.
-
<expression>defines the value to put in the new list for each<item>. -
The
for <item> in <iterable>is like a loop, going through the iterable. -
The optional
if <condition>(placed at the end) can filter which items get included (only those where the condition isTruewill be processed.
Example 1: Create a list of squares of numbers:
numbers = [1, 2, 3, 4, 5]
squares = [num * num for num in numbers]
print(squares)
# Output: [1, 4, 9, 16, 25]
This comprehension iterates over each num in numbers and computes num * num (square) to produce a new list of squared values. It achieves in one line what would take 3-4 lines using a standard loop (initialising an empty list, looping, appending).
Example 2: Add a filter condition – suppose from the same list we want only the squares of even numbers:
even_squares = [num * num for num in numbers if num % 2 == 0]
print(even_squares) # Output: [4, 16]
The added if num % 2 == 0 clause ensures that the expression num * num is executed only for even numbers, so the resulting list is [4, 16] (squares of 2 and 4). This is equivalent to looping and using an if inside the loop to decide whether to append.
List comprehensions make code more concise and often more readable for simple transformations. They are also quite fast in Python (usually faster than an equivalent Python loop with append, because the looping and list construction happen in C under the hood). However, overly complex comprehensions (with multiple conditions or nested loops) can become hard to read, so it’s about finding a balance.
Some key features:
-
You can nest loops in a comprehension (e.g., flattening a matrix with
[elem for row in matrix for elem in row]). -
You can have multiple
ifconditions or even anif…elsein expression part (ternary-like) if needed. -
Python also has similar comprehensions for other collections: set comprehensions (using
{}braces) and dict comprehensions (using{key: value ...}syntax), but list comprehension is the most common, and the concept is similar for all.
One must be mindful that a list comprehension creates the entire list in memory immediately. This is generally fine for moderately sized lists, but if you’re dealing with a huge range or data stream, the memory usage could be a concern. That’s where generator expressions come in.
Generator Expressions
A generator expression is like a list comprehension, but it creates a generator object that yields items one by one instead of building the whole list at once. In syntax, it looks similar to a list comprehension, except it uses parentheses (…) instead of square brackets.
Example: Using a generator expression for the same squares calculation:
numbers = [1, 2, 3, 4, 5]
squares_gen = (num * num for num in numbers)
print(squares_gen) # Output: <generator object <genexpr> at 0x7f8e9c1cdd60>
print(list(squares_gen)) # Output: [1, 4, 9, 16, 25]
When we print squares_gen we see it’s a generator object, not the actual list of squares. We can iterate over this generator to retrieve the values. In the example, list(squares_gen) exhausts the generator to produce a list of results.
We could also loop:
for sq in (num * num for num in numbers):
print(sq)
This would print 1, 4, 9, 16, 25 each on a separate line.
The key difference: the generator does not calculate all squares up front. Instead, each time you iterate and need the next value, it computes that value on the fly (lazy evaluation).
Memory Efficiency: Generator expressions are memory-efficient. Using a generator, you could iterate over a sequence of billions of numbers without holding them all in memory at once, because it yields them one at a time as needed. In contrast, a list comprehension over that range would attempt to create a giant list, likely running out of memory. The generator “remembers” how to generate the next value (essentially, it encapsulates the loop logic internally), and it yields values on demand until it’s exhausted. Once a generator is exhausted (all values produced), it can’t be reused or reset (unless you create it again).
Example with condition: You can include conditions similarly in a generator expression:
odd_squares_gen = (num*num for num in numbers if num % 2 != 0)
This creates a generator of squares of odd numbers (1,3,5 -> 1,9,25). It works analogously to the list comp example, but in generator form.
Important: If you convert a generator to a list (or otherwise consume it fully), you’ll end up using memory for that list. So, to truly benefit, you would iterate over the generator directly. For instance:
# Processing a large generator without keeping all results
gen = (x*x for x in range(1, 1000000000))
total = 0
for val in gen:
total += val
if total > 1000000:
break
print("Reached over one million in partial sum")
This loop will run until the sum of squares exceeds 1,000,000 and then break. We never stored all the squares; we just generated and added them one by one, stopping early. A list comp for the same would have been wasteful (and we didn’t even need all that data).
When to use list comprehension vs generator expression? It depends on your needs:
Use a list comprehension when you need to actually produce a list object to use later (e.g., you need to index into it, or use it multiple times) and the dataset is of a manageable size. It’s straightforward and eager.
Use a generator expression when you are iterating through results once or feeding them into something that can consume an iterator (like a for loop, a sum() function, etc.), especially if the sequence is large or potentially infinite. Generators are also ideal for pipelines – feeding data through multiple steps without intermediate storage.
As a rule of thumb, if you find yourself writing [... for ... in ...] just to loop, and you don’t actually need the list container, using ( ... ) might be more efficient.
Let’s compare quickly with a memory usage example (just conceptual, not actual code here):
List comprehension nums = [i for i in range(1000000)] will create a list of 1,000,000 integers in memory.
Generator expression nums_gen = (i for i in range(1000000)) will create a generator that knows how to produce those integers but at a given moment only holds one (or a few) in memory. The generator itself is lightweight (as shown by sys.getsizeof in an example: a generator object might be only ~88 bytes, whereas a list of a million ints could be many megabytes).
One caution: generator expressions can only be iterated once. If you need to iterate multiple times, either recreate the generator or use a list.
Example to illustrate one-time usage:
gen = (i*i for i in range(5))
for val in gen:
print(val, end=" ")
# Output: 0 1 4 9 16
for val in gen:
print(val, end=" ")
# Output: (nothing, the generator is exhausted)
After the first loop, gen has yielded all its values and is exhausted, so the second loop produces nothing. A list, on the other hand, could be reused multiple times.
Recap:
-
List comprehension: Use
[ ... ]syntax to get a list. Great for creating transformed or filtered lists in a concise way. E.g.,[x**2 for x in data if x > 0]. -
Generator expression: Use
( ... )for a memory-efficient iterator that yields results on the fly. Good for large data streams or when you only need to read the sequence once, or feed it into something likesum()orany(). E.g.,sum(x**2 for x in data)will use a generator internally.
Both forms support conditional filtering and can loop over any iterable. They improve code conciseness and in many cases performance (especially list comps vs manual loops).
Don’t overuse comprehensions for very complex logic. If you have nested loops and multiple conditions, sometimes a plain loop is easier to understand. But for most one-liner transformations, they are excellent tools.
Mini-Exercise:
-
Write a list comprehension that takes a list of words and produces a list of the lengths of those words. For example
words = ["Python", "loops", "awesome"]→[6, 5, 7]. -
Given a list
nums = [3, -4, 2, 0, -1, 8], use a list comprehension with a filter to create a new list of only the positive numbers squared (ignoring negatives and zero).
The result should be[9, 4, 64]for the given list. -
Convert the above comprehension into a generator expression. Use a loop or the
list()function to retrieve the results from the generator and confirm it matches the list. -
(Challenge) Using a single list comprehension, generate all pairs of two dice rolls (values 1-6) that sum to 7.
Output should be a list of tuples, e.g.(1,6), (2,5), ... (6,1).
This requires a nested comprehension or twoforclauses in one comprehension.
