Playing with Python Generators

Today I’ve been playing with generators in Python. I have to say they are yet another awesome Python feature! They give you a very cool and efficient way to store some state within a function, this is useful if you’re trying to do something like generating a sequence, which you would otherwise need a class for. I’m going to go over a couple of quick examples here in order to demonstrate what generators are and what you can do with them.

Side note: You need Python 2.4 or above to use generators, you’ll probably have this anyway, but it might pay to check before wondering why this stuff doesn’t work.

A Quick Introduction

At it’s heart a generator is a function which can return a value several times within it’s body. Each time the function is called it resumes from where it terminated in the previous call, with all it’s internal state intact (i.e. variables are held between calls). In Python a generator is defined by creating a function which uses the ‘yield’ keyword. Each time that a yield is hit the function will return the specified value. Let’s start with an example which defines a generator implementation of the built-in ‘range’ function:

def genrange(start, end):
    i = start
    while i < end:
        yield i
        i += 1

As you can see this is incredibly simple and behaves in the same way as the standard range function. The only difference is that when the generator method is called it will return a generator object rather than a value (in contrast to range, which will return a list of values in the range). These generator objects are iteratable so calling the ‘next’ method on the object will run your code and return the next yielded value. Because the generator object is iteratable, the way in which you are likely to use it should look familiar:

for i in genrange(0, 10):
    print i

If you want to go ahead and create a list from the sequence as you generate it, you can use a list comprehension, like so:

l = [i for i in genrange(0, 10)]

Of course, as hist comprehensions can be much more complicated than this you can use it to achieve more than just creating the equivalent list as calling the built-in range function. For example, you could filter the output elements according to some criteria, or call a method on each element returned.

A Trade Off

Although the above example is incredibly simple, it presents an interesting trade off between memory use and the speed of iteration through the ‘list’. The trade off exists because you could just build the list of elements in a separate function and then iterate through it. This is a perfectly valid approach but if there are a large number of elements you will use a significant quantity of memory to store the list. You are also more likely to double handle the data, unless you are building the list up over a period of time. Using a generator in this case is likely to be more efficient, as you are just creating the elements as you need them, so only storing a minimal amount of data at any one time.

The other side of the trade off is performance. If the process that produces your iterative data is computationally expensive, it makes no sense to calculate more values than you will need. So if you have a data processing loop, that can potentially exit early (i.e. before reaching the end of the ‘list’) a generator will be more efficient. But equally, if you require a tight loop which iterates quickly, you would want to pre-compute the values.

When considering using generators you should weigh up these factors and think which method is going to be most efficient for your situation. I think in a lot of cases, you will end up coming down in favour of the generator solution.

Another Example (Dan Brown is going to love me)

I’m going to leave you with another quick example, namely a very efficient way of generating Fibonacci numbers using a generator. Without further ado:

def fibonacci():
    p1 = 0
    p2 = 1
    while True:
        f = p1 + p2
        p2 = p1
        p1 = f
        yield f

There you go, easy. Many other Fibonacci implementations use a recursive method, which is horribly inefficient. Most other implementations will require either a class to store persistent data in, or the use of static variables, both of which are more tricky to handle from a programming point of view.

Well that’s it. I think I’ve demonstrated that generators are pretty cool. The examples I’ve presented are very simplistic, but I’m sure you can think of awesome uses for them. Go forth and code!