Hakuna MapData! » Blog Archive » Be map slot or not to be: that is the question!
rss

Be map slot or not to be: that is the question!

| Posted in Monitoring |

A couple months ago, we got an email from Chris:

Hi!

The Hadoop cluster has been a bit slow the past few days and I noticed that the bottleneck seems to be coming from the map tasks. We have separate map and reduce task capacities and it continuously looks like the mapper slots are all taken while there’s a surplus of open reduce slots. Is there any reason that we can’t open any of the free reduce slots to map tasks?

Regards,
Chris

The bottleneck comming from the map tasks

The situation described by Chris is quite common on “classical” Hadoop cluster (by “classical”, I mean a cluster without YARN): all map slots are taken (and we still want more!), while many reduce slots are free (or vice versa). In a result, the cluster resources are not fully utilized, so that the jobs at whole are running slower. One example is bellow:

This becomes even more visible when a chart show fairly flat-topped utilization of map slots:

Apache Hadoop MRv1 resource utilization

Everything because in MRv1, we have to divide our resources into map and reduce slots and hard-code them in the TaskTracker’s configuration (mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties). Having them set, TaskTracker can not run more map tasks than mapred.tasktracker.map.tasks.maximum at any given moment, even if no reduce tasks are running. To change these settings, the TaskTracker must be restarted (which sadly causes failure of tasks running on this node).

Obviously, the situation could be alleviated, if we had simply task slots (e.g. mapred.tasktracker.tasks.maximum) and a smart scheduler making a decision dynamically how many map and reduce tasks should run on a given node. This idea (and more) is actually implemented in YARN, which provides much better cluster resource utilization than MRv1 (among other advantages). You can find more information in my picture-rich presentation about Apache Hadoop YARN that I gave the meeting of Warsaw Hadoop User Group.

In general, I think that it is difficult to find ideal hard-coded values for the maximum number of map and reduce tasks running on the cluster, because the workload may change very often. If you look at the chart bellow, you will see that in some moment we are bottlenecked by map tasks, but in some other moments we are bottlenecked by reduce tasks.

Tuning map and reduce tasks distribution

Usually, good starting point is to 60% map tasks and 40% reduce tasks (or 70% map and 30% reduce tasks). At Spotify, we simply started with 60%-40% following the this recommendation. However, in the meantime, we were collecting statistics about our tasks (how many and how long they are running) from Ganglia and JobTracker’s logs. After aggregating statistics from the last couple of months, we discovered that our current usage is something around 68%-32%, according to the charts bellow:

Where I took data from

As mentioned earlier, I took data from Ganglia and JobTracker’s logs. You can find how to setup Ganglia for your Hadoop (and HBase) clusters in my previous post titled Ganglia configuration for a small Hadoop cluster and some troubleshooting.

Additionally, you can also use simple JobTracker’s logs that can be enabled in hadoop-log4j.properties file:

hadoop.mapreduce.jobsummary.logger=INFO,JSA
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex=20

Then in your hadoop log directory, you will find hadoop-mapreduce.jobsummary.log that contains content usefull for many cluster analysis:

13/03/01 22:32:58 INFO mapred.JobInProgress$JobSummary:
jobId=job_201303012227_0001,submitTime=1362177138946,launchTime=1362177141965,firstMapTaskLaunchTime=1362177147207,firstReduceTaskLaunchTime=1362177165121,firstJobSetupTaskLaunchTime=1362177141968,firstJobCleanupTaskLaunchTime=1362177173792,finishTime=1362177178517,numMaps=8,numSlotsPerMap=1,numReduces=2,numSlotsPerReduce=1,user=kawaa,queue=default,status=SUCCEEDED,mapSlotSeconds=36,reduceSlotsSeconds=17,clusterMapCapacity=2,clusterReduceCapacity=2
13/03/01 22:35:29 INFO mapred.JobInProgress$JobSummary:
jobId=job_201303012227_0003,submitTime=1362177310049,launchTime=1362177312838,firstMapTaskLaunchTime=1362177316497,firstJobSetupTaskLaunchTime=1362177313141,firstJobCleanupTaskLaunchTime=1362177326372,finishTime=1362177329731,numMaps=8,numSlotsPerMap=1,numReduces=2,numSlotsPerReduce=1,user=kawaa,queue=default,status=KILLED,mapSlotSeconds=21,reduceSlotsSeconds=3,clusterMapCapacity=2,clusterReduceCapacity=2

No related posts found.

VN:F [1.9.20_1166]

Rate this post!

Rating: 3.5/5 (2 votes cast)
Be map slot or not to be: that is the question!, 3.5 out of 5 based on 2 ratings

Comments

comments