MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
We have found that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad, which we have implemented in Haskell. The basic idea is to convert each map / reduce stage into a monadic function, so MapReduce becomes a composition:
mapReduce state = runMapReduce mr state
mr = distributeMR >>= wrapMR mapper >>= wrapMR reducer
In addition, the implementation is inherently parallel.
Suggestions for future research
- Look at ways of making distributing data across threads more efficient
- Develop ways of using true concurrency so the approach can be used on clusters
- The GIT repository git://github.com/Julianporter/Haskell-MapReduce.git containing:
- The three source files for the MapReduce library
- The source for the word-count program
- A Cabal build file