Hakuna MapData! » mapreduce

Hadoop adventures at Spotify (slides from my talk at Strata + Hadoop World 2013)

| Posted in Presentations |


I am very happy to present the slides from my presentation at Strata + Hadoop World 2013.

The presentation is titled ” Hadoop adventures at Spotify” and I am simply talking about five real-world Hadoop issues that either broke our cluster at Spotify or made it very unstable. Each story comes from our JIRA dashboard and is based on facts! ;) To make it even more engaging, I am exposing real graphs, numbers, even our emails and conversations. For each story, I am sharing the mistakes that we made and I am describing the lessons that we learned.

This includes also the mistake that I made and I do not like to talk about, but today I will share it as well ;)

Be map slot or not to be: that is the question!

| Posted in Monitoring |


A couple months ago, we got an email from Chris:


The Hadoop cluster has been a bit slow the past few days and I noticed that the bottleneck seems to be coming from the map tasks. We have separate map and reduce task capacities and it continuously looks like the mapper slots are all taken while there’s a surplus of open reduce slots. Is there any reason that we can’t open any of the free reduce slots to map tasks?


Quick Introduction To Apache Hadoop MapReduce Java API

| Posted in Presentations |


The slides from the quick presentation that I gave at Spotify as part of “knowledge sharing session”.

Hope you will find slides useful!

How to build Hadoop cluster on Amazon Elastic MapReduce using Karashpere Studio for EMR

| Posted in Presentations, Software |


Here is my video (15 minute long, in Polish), that shows how to create Hadoop cluster on Amazon Elastic MapReduce and use Karashpere Studio for EMR (a plugin for Eclipse). It demonstrates how to rent 10 of EC2 small instances to run exemplary calculation that process ~220GB of data in less then one hour, what costs $1.25.