MapReduce is an increasingly popular paradigm for cloud-based parallel computation. Unfortunately, though the idea itself is extremely simple, current implementations, such as HADOOP, make programming MapReduce applications extremely complex, partly because simply setting up and managing a HADOOP cluster is very hard work, partly because the HADOOP framework is intrusive in that it forces the application programmer to adopt a rather ungrateful API, so instead of simply writing application code and wrapping it in a standard way to make MapReduce applications, programmers have to fit their intentions around what HADOOP will and will not let them do.
My intention in this project is to provide an implementation of MapReduce in Haskell that removes both these problems. First, functional programming, and in particular monads, fit MapReduce, with its emphasis on data flowing through a chain of processing steps, much better than does the OO paradigm. Second, Haskell’s emphasis on higher-order functions and flexible type structure makes it possible to provide the standard wrapper envisaged above which takes neutral code which is 100% application and lifts it naturally into the MapReduce framework without any need for the programmer to know anything about the framework. Third, in the form of Cloud Haskell we have a natural, unobtrusive framework for cloud-based programming.
My plan for this project is as follows:
- Devise theoretical for simple MapReduce applications running on a single machine
- Paper to be published in The Monad Reader (see below): done
- Code written and described in paper: done
- Library code and sample application available on http://code.haskell.org (see below): done
- Framework for distributed persistence: in progress
- Framework for distributed task management: not yet started
- Decided to use Cloud Haskell: done
- Learning to use Cloud Haskell with assistance from its developers: not yet started
- Not yet started
- Monadic MapReduce paper : describes the basic approach, using a monad to wrap processing code so as to abstract the MapReduce concept
- Cloud Haskell : the concurrency framework that is to be used in moving the monad onto the cloud
- MapReduce project DARCS repository : the latest version of the code-tree for the project