Implementing MapReduce in Haskell


MapReduce is a general technique for massively parallel programming developed by Google.  It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.

We have found that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad, which we have implemented in Haskell.  The basic idea is to convert each map / reduce stage into a monadic function, so MapReduce becomes a composition:

mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
mr = distributeMR >>= wrapMR mapper >>= wrapMR reducer

In addition, the implementation is inherently parallel.

Suggestions for future research

  1. Look at ways of making distributing data across threads more efficient
  2. Develop ways of using true concurrency so the approach can be used on clusters



4 thoughts on “Implementing MapReduce in Haskell

    1. On a Macbook Pro, it took only a few seconds to do word-count on a million words. I am working on a distributed version, but rather lost the impetus. Perhaps I should start again.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s