cft

Next Generation

This time, we evolve once again—unto generators, from basic iteration to full-fledge coroutines.


user

Dan Gittik

3 years ago | 10 min read

The next stop after functions (and beyond, and internals 1 and 2) is generators: functions with a state. The premise is very simple—you write a regular function with yield statements instead of return statements, and when you run the function—even as it’s yielded a value, you can resume it from there on, harvesting the fruit of its gradual execution. The use-cases, however, are many—and fascinating, so let’s get to it.

Yielding Control

Generators are often presented as a sophisticated way to encapsulate an iteration, even though they’re much more than that. Even so, let’s start with that:

>>> def gen():
... yield 1
... yield 2
... yield 3

When we call this generator function, it doesn’t immediate execute. Instead, it returns a generator object:

>>> g = gen()
>>> g
<generator object gen at 0x...>

To get this generator goin’, we have to use the built-in next function:

>>> next(g)
1

It starts executing, gets to its first yield statement, which evaluates to 1—and that’s what we’re getting, with the generator frozen in time at that moment. When we’re ready, we can revive it:

>>> next(g)
2
>>> next(g)
3

Until, finally—after the generator’s exhausted its execution—it’ll start complaining:

>>> next(g)
Traceback (most recent call last):
...
StopIteration
>>> next(g)
Traceback (most recent call last):
...
StopIteration

The cool part about it—and the reason generators are so often associated with iterators—is that this behavior is suspiciously similar to the for loop protocol. When I say protocol, it makes it sound menacing; what I mean is just “how things happen”. You know how if statements automatically convert their arguments to booleans, so…

if obj:
...

Is actually like:

cond = bool(obj)
if cond:
...

Similarly, for loops take their arguments, convert them to an iterator using the built-in iter function, and keep invoking next on that and doing an iteration with its result, assigning it to the parameters in its declaration. Eventually, a StopIteration exception is raised, which the for loop silently catches, and stops. In fact, we can emulate this for loop…

for i in obj:
...

…with this while loop:

iterator = iter(obj)
while True:
try:
i = next(iterator)
...
except StopIteration:
break

Back to the topic: generators conveniently fit the next/StopIteration paradigm—but they also have this curious quality:

>>> g
<generator object gen at 0x...>
>>> iter(g)
<generator object gen at 0x...>

In other words, they don’t care for iter; it doesn’t do nothin’ to them, so they can be used in for loops as-is:

>>> g = gen():
>>> for i in g:
... print(i)
1
2
3

Or even:

>>> for i in gen():
... print(i)
1
2
3

Do note that if you keep a reference to a generator object—once it’s exhausted, it’ll keep raising StopIteration on every subsequent next, so it’ll result in rather short for loops:

>>> g = gen()
>>> for i in g:
... print(i)
1
2
3
>>> for i in g:
... print(i)
# Nothing
>>> for i in g:
... print('Whaaat')
# Nothing :(

That’s also the reason we can’t index generators, or ask about their length—they don’t actually describe a collection of items, but rather a process to generate said items—which is ongoing, and as such, unindexable and immeasurable:

>>> g = gen()
>>> g[0]
Traceback (most recent call last):
...
TypeError: 'generator' object is not subscriptable
>>> len(g)
Traceback (most recent call last):
...
TypeError: object of type 'generator' has no len()

To pull that off, you’d have to convert the generator to a list—which effectively executes it to exhaustion, and gathers all its items in a more accessible data structure:

>>> g = gen()
>>> x = list(g)
>>> x[0]
1
>>> len(x)
3

Reaping the Generated Benefits

Generators are great for a great many reasons. First of all, their memory footprint is almost nonexistent, seeing as they’re a moment frozen in time, all zen-like, ready to generate the next item, et c’est tout; no past, no future. Compare this:

>>> import sys
>>> sys.getsize(1)
28 # A simple integer takes 28 bytes, so...
>>> x = [i for i in range(1_000_000)]
# A million integers take ~25 megabytes.

To this:

>>> def million():
... for i in range(1_000_000):
... yield i>>> m = million()
>>> sys.getsize(m)
128 # For any size!

As a quick sidenote, back in Python 2.7, range used to return a list—meaning it had to allocate everything before the first item was even used. This led to the introduction of the rather generative xrange, and in Python 3—they figured there were a lot of backwards-incompatible changes anyway—it was replaced, so range is actually a generator:

>>> r = range(10)
>>> r
range(0, 10) # See? Not running yet.

A bunch of other stuff that should’ve been generators were converted to such in Python 3, introducing all sorts of fun breakages. But even in Python 2.7—generators were there, and they were used when it was clear an incremental solution is in order. Consider os.walk: here’s a code that prints the directory containing the password.txt file:

for directory, subdirectories, files in os.walk('/'):
for file in files:
if file == 'password.txt':
print(directory)
break

Traversing something of a recursive, nest-y nature such as a filesystem was an obvious candidate for a “stunted” process, which yields the next directory’s contents (separated to subdirectories and files), rather than loading the entire thing into memory at once.

Counting to Infinity

Speaking of obvious candidates for generators—some things are just impossible with regular data structures or functions. Consider this:

def infinity():
i = 0
while True:
yield i
i += 1

This guy’s gonna keep spinning forever—just yielding one integer after the other, until the stars go out and the universe grows silent. It’s actually quite handy for a counter:

>>> counter = infinity()
>>> next(counter)
0
>>> next(counter)
1

But if an infinite loop is what your heart desires, there you have it:

>>> for i in infinity():
... print(i)
1
2
3
... # ^C

Silly use-cases aside, this can be used to produce an actually useful infinite stream, such a pseudo-random generator. Whenever I talk about pseudo-random generators, I make it a point to mention the Blum-Blum-Shub algorithm, just because it has an amazing name, and I have a great time imagining the Carolesque tea party in which Blum, Blum and Shub came up with it:

>>> def blum_blum_shub(seed):
... m = 11 * 19 # Some product of prime numbers.
... x = seed
... while True:
... x = x**2 % m
... yield x>>> prg = blum_blum_shub(3)
>>> next(prg)
9
>>> next(prg)
81
>>> next(prg)
82

If you really want to do list-like things to your iterators, there’s a standard module for that, give it a look:

>>> import itertools
>>> itertools.islice(prg, 5)
<itertools.islice object at 0x...> # Ugh
>>> list(_)
[9, 81, 82, 36, 42]

Some More Nifty Features

What else is there to say about generators? Well, they can be nested—

>>> def gen1():
... yield 1
>>> def gen2():
... yield 2
>>> def gen():
... for i in gen1():
... yield i
... for i in gen2():
... yield i
... yield 3

But more elegantly so, using the yield from statement:

>>> def gen():
... yield from gen1()
... yield from gen2()
... yield 3

And they even work in comprehensions, if you use rounded parenthesis. Now you know who stole the tuple’s comprehension syntax:

>>> g = (i**2 for i in range(10))
>>> g
<generator object <genexpr> at 0x...>

This can come in handy if you have to do a loop to arrive at some result, but would like to break as soon as you do to avoid unnecessary calculations. You can do so in a comprehension, but using a list actually goes ahead and finishes the entire loop before giving you the result you need. For example, to find the first integer whose square is greater than 1000, you could do:

>>> [i for i in range(1000) if i**2 > 1000][0]
32

But then it’d go ahead and calculate all the squares from 33 to 999, too. This, on the other hand:

>>> next((i for i in range(1000) if i**2 > 1000))
32

Only ever generates the first integer, and since nobody bothers to re-run it, it doesn’t bother to compute the rest; this is sometimes called lazy execution, and generators are accused of being lazy functions.

This usage is in fact so common, it even has some syntactic sugar of its own—you can drop the double parenthesis for better readability:

>>> next(i for i in range(1000) if i**2 > 1000)
32

next gets a tad annoying if the iteration is empty. If I’d try to guess that such a number would be within the first 30, and limit my range thusly…

>>> next(i for i in range(30) if i**2)
Traceback (most recent call last):
...
StopIteration

…I’d get an ugly error for trying to run what was an exhausted (effectively, empty) generator. To that end, next takes an extra keyword—a default value, in case the iterator it’s consuming ends up a disappintment:

>>> next(i for i in range(30) if i**2, None)
Traceback (most recent call last):
...
SyntaxError: Generator expression must be parenthesized

The bad news are, I lied: the syntactic sugar for more elegant generator comprehension only works if it’s the only argument to a function; otherwise, it has to be parenthesized explicitly. The good news are, I didn’t like: next does take an extra argument for a default value:

>>> next((i for i in range(30) if i**2), None)
None

Oh Coroutines

The truth is, generators are much more interesting that that. Think about it this way: standard functions (sometimes called “subroutines”) have a single entry point—that is, they always start at the beginning, right underneath the signature—and can have multiple exit point—that is, return statements. Generators, on the other hand—in this context, called coroutines—also have multiple entry points.

Sure, they start at the beginning—but then they execute until the first yield statement, pause—and the next time you run them, they resume execution from that point. “Sure,” you might say, “but that’s not really an entry point—when I call a function, I can communicate some arguments to it, which parametrizes its execution, and is the entire point. Surely you can’t pass in arguments when you resume a generator.” Well:

>>> def coro():
... print('starting')
... x = yield 1
... print(f'x: {x}')
... y = yield 2
... print(f'y: {y}')

This takes a while to digest, but it’s actually quite simple—yield statements are in fact expressions, whose value is, wait for it…

No, seriously, wait for it. When a coroutine hits a yield statement, it pauses—so while the yield’s argument is returned, that expression’s value is still pending. Only when you resume the execution, does it resolve:

>>> c = coro()
>>> next(c)
starting
1
>>> # Ok, resuming...
>>> next(c)
x: None
2

That’s rather disappointing; None is not a very interesting value at all. However, the real reason we got it is that we called next, which simply resumes the generator, without passing in anything. It’s good for writing iterators, but for a proper coroutine you’d have to use send:

>>> c = coro()
>>> next(c)
starting
1
>>> # Resuming with arguments:
>>> c.send('a')
x: a
2

The first time we’re running the coroutine, there’s nothing to communicate—any initial arguments are passed in via the parameters, like in a normal function, so we might as well use next. The second time, however, we send in 'a', which gets assigned to x and printed, right before we get back 2. Should we send in something else:

>>> c.send('b')
y: b
Traceback (most recent call last):
...
StopIteration

That assigns 'b' to y, then falls right off the edge of the generator, resulting in its end and a StopIteration exception.

Exceptions and Return Values

Coroutines also work with exceptions; any yield statement can resume with an error—as if during the evaluating of its value, an exception was raised. That’d let us use more sophisticated flow control in our coroutines:

>>> def coro():
... try:
... yield 1
... yield 2
... except ValueError:
... print('error')
... yield 3>>> c = coro()
>>> next(c)
1
>>> c.throw(ValueError)
error
3

Similarly, when a coroutine does end—it can do so implicitly, silently falling off the function’s edge; or explicitly, with a return statement, which even supports a return value. It still raises a StopIteration, because that’s what ending a generator means—but the exception carries with it the return value, like so:

>>> def coro():
... yield 1
... return 2
>>> c = coro()
>>> next(c)
1
>>> try:
... next(c)
... except StopIteration as e:
... print(e.args[0])
2

Conclusion

Generators are immensely cool. With a simple, additional keyword—yield—Python has managed to introduce an entirely new programming paradigm. Not only can it be used for efficient iterators and comprehensions, or to tackle tasks of infinite nature—it’s actually a full-fledge coroutine, with multiple entry and exit point, to which we can communicate arguments, and even exceptions, at any time. We’ll encounter generators again later on, as they resurface in some pretty advanced topics, like context managers and asynchronous programming. Until then, I yield my time.

This article was originally published by Dan gittik on medium.

Upvote


user
Created by

Dan Gittik


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles