Posted in Tutorials | Posted on 15-06-2014
I am happy to say that my blog post “Agile migration of a single-node cluster from MRv1 to YARN” has been published by IBM developerWorks. Please find the abstract of the article below:
Although Hadoop vendors such as Cloudera and Hortonworks provide excellent and detailed documentation for installing YARN, they follow an all-or-nothing approach. With this approach, you perform almost all of the migration steps first, then you start the cluster and verify that it is correctly migrated. If the migration fails, you review the migration steps to determine where the misconfiguration was made. Because the migration to YARN is a complex and error-prone process, it can be challenging to troubleshoot an almost-migrated cluster.
In contrast, this article describes how to use an agile approach with quick and frequent iterations. In the first iteration, you install only the necessary components and start the YARN cluster to verify whether it runs applications successfully. In the next iterations, you extend the cluster’s functionality and optimize the most important configuration settings. The goal is to have a working YARN cluster that can process users’ applications after each iteration. Using this approach, administrators have the ability to temporarily halt the migration process after each iteration and continue it later at a convenient time.
Read more at IBM developerWorks.
At Spotify, we have a company-wide culture of celebrating successes and … failures. Because we want to iterate fast, we do realize that failures can happen. On the other hand, we can not afford to make the same mistake more than once. One way of preventing from that is sharing our failures, mistakes and learning across the company.
Today however, I would like to share my failures … outside of the company ;) While my failures relate to my recent work with Apache Hadoop cluster, I think that the lessons that I have learned are generic enough, so that many people can benefit from them.
Posted in Monitoring | Posted on 06-10-2013
A couple months ago, we got an email from Chris:
The Hadoop cluster has been a bit slow the past few days and I noticed that the bottleneck seems to be coming from the map tasks. We have separate map and reduce task capacities and it continuously looks like the mapper slots are all taken while there’s a surplus of open reduce slots. Is there any reason that we can’t open any of the free reduce slots to map tasks?
A typical day of a data engineer at Spotify revolves around Hadoop and music. However after some time of simultaneous developing MapReduce jobs, maintaining a large cluster and listening to perfect music for every moment, something surprising might happen…!
Well, after some time, a data engineer starts discovering Hadoop (and its related concepts) in the lyrics of many popular songs. How can Coldplay, Black Eyed Peas, Michael Jackson or Justin Timberlake sing about Hadoop?
Maybe it is some kind of illness? Definitely! A doctor could call it “inlusio elephans” ;)