Posted in Monitoring, Troubleshooting | Posted on 25-05-2013|
A couple of weeks ago, we got a JIRA ticket complaining about JobTracker being super slow (while it used to be super snappy most of the time). Obviously in such a situation, developers and analysts are a bit annoyed because it results in difficulties in submitting and tracking status of MapReduce jobs (however, the side effect is having a time for unplanned coffee break, what should not be so bad ;)) Anyway, we are also a bit ashamed and sad, because we aim for a perfect Hadoop cluster and no unplanned
coffee breaks interruptions.
After a quick look at an unresponsive JobTracker web interface (yes, it took a while!), we have discovered that very large jobs are running on the cluster. Only the largest one, at that time (a Hive query, by the way), run 58,538 tasks and it aimed to process 21.82 TB of data (1.5 year of data from one of our dataset). Apart from this one, there were dozen or so jobs that were running tens of thousands of tasks, as well as tens of jobs running thousands of tasks. Most of them were ad-hoc Hive jobs – mostly because it is really easy to implement a large Hive job unintentionally, even if
To attack this problem, we decided to limit the number of tasks per job using
Casper: What value for
Gustav: Maybe 50K
Casper: Why not 30K?
Gustav: Because we might be running large production jobs and it is safer to start with a high value and eventually decrease it later.
Casper: Hmmm? Sounds good…! But maybe 40K, not 50K? ;)
Gustav: OK. Let’s meet in the middle…
Yes, the dialogue above did happen, but quickly after one more question was asked – Should a real data-driven company make such a decision based on a guess or data? Actually, the answer can be only one – data!.
To find data! we have looked at logs from JobTracker. Based on 521,313 jobs from last 70 days, we did found that our largest production job creates 22,609 tasks (and it will be increasing each day, because our its input dataset grow each day). In consequence, the jobs that create more than 22,609 tasks are ad-hoc jobs (in our case, mostly Hive queries or Python streaming jobs implemented in Python MapReduce module of Luigi, an open-sourced scheduler – please watch the video and browse the code). What is really interesting, these ad-hoc jobs usually fail or is killed for some reason (most often due to some parsing error, out of memory exception or a kill command invoked by user). To give just one example – a job created 56,024 tasks and it was running for 4h:24min and … it was killed by a user (probably because he/she expected it to fail soon, or he/she could not wait longer any more and simply started the same job on smaller input ;). Now with
Hadoop job_201305010824_140585 on jobtracker User: kawaa Job Name: select * from end_song_cleaned where dt...10(Stage-1) Job File: hdfs://namenode:54310/user/kawaa/.staging/job_201305010824_140585/job.xml Submit Host: gina Submit Host Address: 10.255.3.21 Job-ACLs: All users are allowed Job Setup:None Status: Failed Failure Info:Job initialization failed: java.io.IOException: The number of tasks for this job 42987 exceeds the configured limit 23000 at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:717) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3750) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob .run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker .runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker .run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Started at: Sun May 19 03:20:59 UTC 2013 Failed at: Sun May 19 03:21:00 UTC 2013 Failed in: 1sec Job Cleanup:None
The possible future
Probably, sooner than later, a very important and large enough job will not be allowed to start due to such an exception. How will we respond to it? First we will try to reduce the number of map and reduce tasks for this job by increasing the values of