Ganesan Senthilvel: MapReduce Authors

Thursday, September 10, 2015

MapReduce Authors

We know that MapReduce is the ice breaker for the traditional computing model by introducing Scale Out technology in the easy way. Google has a separate research web page on Google's MapReduce at http://research.google.com/archive/mapreduce.html . Authors are Sanjay Ghemawat & Jeff Dean from Google Inc.

As the research scholar, I liked the motivation of their research paper - Large Scale Data Processing. It was achievable with super computing on earlier days. But key difference is parallel execution of hundreds or thousands of CPUs, with commodity box and easy mode. More over, MapReduce provides:

Automatic parallelization and distribution
Fault-tolerance
I/O scheduling
Status and monitoring

Fault-tolerance is handled via re-execution. On worker failure:

Detect failure via periodic heartbeats
Re-execute completed and in-progress map tasks
Re-execute in progress reduce tasks
Task completion committed through master

Data Locality Optimization, Skipping Bad Records and Compression of intermediate data are their few refinement technique to boost the performance on large scale data.

In their research paper, the use case was listed in August 2004 with the below metric:

Number of jobs 29,423
Average job completion time 634 secs
Machine days used 79,186 days
Input data read 3,288 TB
Intermediate data produced 758 TB
Output data written 193 TB
Average worker machines per job 157
Average worker deaths per job 1.2
Average map tasks per job 3,351
Average reduce tasks per job 55
Unique map implementations 395
Unique reduce implementations 269
Unique map/reduce combinations

Amazing and game changing methodology with easiness, as the result of great minds research from Google. Herez an opportunity for me to highlight the authors of MapReduce - Sanjay Ghemawat & Jeff Dean

Thursday, September 10, 2015

MapReduce Authors

No comments:

Post a Comment

Blog Archive

Followers