Suppose you have a (sorted) list of dicts containing the names of cities and
states, and you want to print them out with headings by state:
>>> cities = [
... { 'city' : 'Harford', 'state' : 'Connecticut' },
... { 'city' : 'Boston', 'state' : 'Massachusetts' },
... { 'city' : 'Worcester', 'state' : 'Massachusetts' },
... { 'city' : 'Albany', 'state' : 'New York' },
... { 'city' : 'New York City', 'state' : 'New York' },
... { 'city' : 'Yonkers', 'state' : 'New York' },
... ]
First let me explain operator.itemgetter(). This function is a factory
for new functions. It creates functions that access items using a key.
In this case I will use it to create a function to access the 'state'
item of each record:
>>> from operator import itemgetter
>>> getState = itemgetter('state')
>>> getState
<operator.itemgetter object at 0x00A31D90>
>>> getState(cities[0])
'Connecticut'
>>> [ getState(record) for record in cities ]
['Connecticut', 'Massachusetts', 'Massachusetts', 'New York', 'New York', 'New York']
So the value returned by itemgetter('state') is a function that accepts a dict as an argument
and returns the 'state' item of the dict. Calling getState(d) is the
same as writing d['state'].
What does this have to do with itertool.groupby()?
>>> from itertools import groupby
>>> help(groupby)
Help on class groupby in module itertools:
class groupby(__builtin__.object)
| groupby(iterable[, keyfunc]) -> create an iterator which returns
| (key, sub-iterator) grouped by each value of key(value).
groupby() takes an optional second argument which is a function to
extract keys from the data. getState() is just the function we need.
>>> groups = groupby(cities, getState)
>>> groups
<itertools.groupby object at 0x00A88300>
Hmm. That's a bit opaque. groupby() returns an iterator. Each item in
the iterator is a pair of (key, group). Let's take a look:
>>> for key, group in groups:
... print key, group
...
Connecticut <itertools._grouper object at 0x0089D0F0>
Massachusetts <itertools._grouper object at 0x0089D0C0>
New York <itertools._grouper object at 0x0089D0F0>
Hmm. Still a bit opaque :-) The key part is clear - that's the state,
extracted with getState - but group is another
iterator. One way to look at it's contents is to use a nested loop. Note
that I have to call groupby() again, the old iterator was consumed by the
last loop:
>>> for key, group in groupby(cities, getState):
... print key
... for record in group:
... print record
...
Connecticut
{'city': 'Harford', 'state': 'Connecticut'}
Massachusetts
{'city': 'Boston', 'state': 'Massachusetts'}
{'city': 'Worcester', 'state': 'Massachusetts'}
New York
{'city': 'Albany', 'state': 'New York'}
{'city': 'New York City', 'state': 'New York'}
{'city': 'Yonkers', 'state': 'New York'}
Well, that makes more sense! And it's not too far from the original
requirement, we just need to pretty up the output a bit. How about this:
>>> for key, group in groupby(cities, getState):
... print 'State:', key
... for record in group:
... print ' ', record['city']
...
State: Connecticut
Harford
State: Massachusetts
Boston
Worcester
State: New York
Albany
New York City
Yonkers
Other than misspelling Hartford (sheesh, and I grew up in Connecticut!)
that's not too bad!
|