Summary
MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
We have found that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad, which we have implemented in Haskell. The basic idea is to convert each map / reduce stage into a monadic function, so MapReduce becomes a composition:
mapReduce state = runMapReduce mr state
where
mr = distributeMR >>= wrapMR mapper >>= wrapMR reducer
In addition, the implementation is inherently parallel.
Suggestions for future research
- Look at ways of making distributing data across threads more efficient
- Develop ways of using true concurrency so the approach can be used on clusters
Downloads
- The GIT repository git://github.com/Julianporter/Haskell-MapReduce.git containing:
- The three source files for the MapReduce library
- The source for the word-count program
- A Cabal build file
Filed under: Algorithms, Functional Programming, Haskell, MapReduce, Cabal (software), Concurrent Haskell, Functional programming, GIT, Haskell, Haskell MapReduce, MapReduce, Parallel computing



[...] have discussed elsewhere my initial implementation of MapReduce as a monadic operation that works in parallel processes on a single machine. Here I take the first steps towards [...]
[...] have discussed elsewhere my initial implementation of MapReduce as a monadic operation that works in parallel processes on a single machine. Here I take the first steps towards [...]
Ah, but how fast will it run?
On a Macbook Pro, it took only a few seconds to do word-count on a million words. I am working on a distributed version, but rather lost the impetus. Perhaps I should start again.