MapReduce in Haskell

Introduction

MapReduce is an increasingly popular paradigm for cloud-based parallel computation.  Unfortunately, though the idea itself is extremely simple, current implementations, such as HADOOP, make programming MapReduce applications extremely complex, partly because simply setting up and managing a HADOOP cluster is very hard work, partly because the HADOOP framework is intrusive in that it forces the application programmer to adopt a rather ungrateful API, so instead of simply writing application code and wrapping it in a standard way to make MapReduce applications, programmers have to fit their intentions around what HADOOP will and will not let them do.

My intention in this project is to provide an implementation of MapReduce in Haskell that removes both these problems.  First, functional programming, and in particular monads, fit MapReduce, with its emphasis on data flowing through a chain of processing steps, much better than does the OO paradigm.  Second, Haskell’s emphasis on higher-order functions and flexible type structure makes it possible to provide the standard wrapper envisaged above which takes neutral code which is 100% application and lifts it naturally into the MapReduce framework without any need for the programmer to know anything about the framework.  Third, in the form of Cloud Haskell we have a natural, unobtrusive framework for cloud-based programming.

Plan

My plan for this project is as follows:

  1. Devise theoretical for simple MapReduce applications running on a  single machine
    • Paper to be published in The Monad Reader (see below): done
  2. Write demonstrator code for the single machine concept
    • Code written and described in paper: done
    • Library code and sample application available on http://code.haskell.org (see below): done
  3. Devise theoretical framework for parallelising MapReduce to run on the cloud
    • Framework for distributed persistence: in progress
    • Framework for distributed task management: not yet started
  4. Determine practical approach to cloud-based concurrency
    • Decided to use Cloud Haskell: done
    • Learning to use Cloud Haskell with assistance from its developers: not yet started
  5. Implement parallel version of MapReduce
    • Not yet started

Resources

Documents

  • Monadic MapReduce paper : describes the basic approach, using a monad to wrap processing code so as to abstract the MapReduce concept
  • Cloud Haskell : the concurrency framework that is to be used in moving the monad onto the cloud

Code

Other information

Advertisements

One thought on “MapReduce in Haskell

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s