Posted by Adam Kawa | Posted in Presentations | Posted on 24-11-2013
I am very happy to present the slides from my presentation at Strata + Hadoop World 2013.
The presentation is titled ” Hadoop adventures at Spotify” and I am simply talking about five real-world Hadoop issues that either broke our cluster at Spotify or made it very unstable. Each story comes from our JIRA dashboard and is based on facts! ;) To make it even more engaging, I am exposing real graphs, numbers, even our emails and conversations. For each story, I am sharing the mistakes that we made and I am describing the lessons that we learned.
This includes also the mistake that I made and I do not like to talk about, but today I will share it as well ;)
Posted by Adam Kawa | Posted in Monitoring | Posted on 06-10-2013
A couple months ago, we got an email from Chris:
The Hadoop cluster has been a bit slow the past few days and I noticed that the bottleneck seems to be coming from the map tasks. We have separate map and reduce task capacities and it continuously looks like the mapper slots are all taken while there’s a surplus of open reduce slots. Is there any reason that we can’t open any of the free reduce slots to map tasks?
Here is my video (15 minute long, in Polish), that shows how to create Hadoop cluster on Amazon Elastic MapReduce and use Karashpere Studio for EMR (a plugin for Eclipse). It demonstrates how to rent 10 of EC2 small instances to run exemplary calculation that process ~220GB of data in less then one hour, what costs $1.25.