Thursday, August 25, 2011

map() | filter() | reduce()



I've been learning Python slowly over the last two years. It's the first language I've used where I really have a chance to tinker with the tools of functional programming. At first, they appeared outright bizarre, but the pieces have finally fallen in place. A few weeks ago while running, I couldn't stop thinking about how awesome "map()" is. Hopefully, after reading this, you'll feel that way too.

Here's what map() does: it takes at least two arguments, a function f and n lists, where n is the number of parameters of f. Its output is a list where every entry is f(l_1[m], l_2[m],...,l_n[m])--that is, it takes every item in the list, puts it in f, then stores it in a new list and returns that list when it does.

Sounds kinda pointless, right? You could just use a loop to do that.
newList = []

for item in list: newList.append(f(item))


Map does this same thing more concisely:
newList = map(f, list)


Perhaps I haven't convinced you. Part of the value in map() is concision. It's clear, in the third piece of code, what you're doing--creating a new list by funneling list through f. Mentally, that is a simple procedure. To express it explicitly, with all its overhead, is more sophisticated. Someone reading the code would have to stop and think "ok, he's looping through all these entries, putting them in this array, then storing them in this array." All that overhead obfuscates; concision presents the operation with less mental overhead, allowing the reader--and the coder, for that matter--to focus less on the overhead and more on the algorithm at large.

Additionally, map() is optimized for its purpose--loops are not necessarily. As a consequence, you end up with faster runtime. Consider this script, where two sets of random numbers are funneled through testFunc:

import datetime
from random import randint

testFunc = lambda x,y: x*y+x

randFunc = lambda x: randint(1,50000)
setOne = map(randFunc, range(1,50000))
setTwo = map(randFunc, range(1,50000))


loopList = []
loopStart = datetime.datetime.now()
for x,y in zip(setOne, setTwo): loopList.append(testFunc(x,y))
print str(datetime.datetime.now()-loopStart)

loopStart = datetime.datetime.now()
mapList = map(testFunc, setOne, setTwo)
print str(datetime.datetime.now()-loopStart)

Running on my crappy laptop, the loop took nearly three times longer to execute.

There are two other functions that are just as cool as map(): filter() and reduce(). They perform different, albeit valuable, tasks. Filter() doesn't accept just any function, but requires a boolean function;
filter(lambda x: x>2, lst)
would be acceptable. Filter(), like map(), returns a list, but it returns a list of values from lst where the provided function returns true.

Last of all is reduce(), which takes a function of at least two inputs. To start, it takes the first two values of a given list and puts them in the provided function. Then, it uses the previous result as the first parameter and the next entry of the list as the second parameter. So, a sum, for example, would be:
reduce(lambda x,y: x+y, lst)


The real power with these functions is that they return lists, so once they produce a result, it can be nested in another one of the set. As another example, this one-liner removes dead "twirps"--those without any calories left--from the testTwirp array:
map(testTwirp.remove, filter(lambda x: x.calories < 0, testTwirp))

The invocation of filter finds the twirps that are dead; the map of those twirps into
their original list's remove method takes them out of the picture. The resulting list isn't saved, as it's just a large list of "None's."

0 Comments:

Post a Comment

<< Home

-->