Hakuna MapData!
rss

My Ignite Presentation, “Hadoop Playlist”, at Strata 2013.

| Posted in Presentations |

Last month, I had to pleasure to give the Ignite presentation about “Hadoop Playlist”, at NYC Strata 2013. Although, I am not the greatest speaker, I hope you will enjoy my presentation!

If you like Hadoop and music, you can listen to Hadoop Playlist and you can read more about it one of my previous blog posts.

Hadoop adventures at Spotify (slides from my talk at Strata + Hadoop World 2013)

| Posted in Presentations |

I am very happy to present the slides from my presentation at Strata + Hadoop World 2013.

The presentation is titled ” Hadoop adventures at Spotify” and I am simply talking about five real-world Hadoop issues that either broke our cluster at Spotify or made it very unstable. Each story comes from our JIRA dashboard and is based on facts! ;) To make it even more engaging, I am exposing real graphs, numbers, even our emails and conversations. For each story, I am sharing the mistakes that we made and I am describing the lessons that we learned.

This includes also the mistake that I made and I do not like to talk about, but today I will share it as well ;)

Be map slot or not to be: that is the question!

| Posted in Monitoring |

A couple months ago, we got an email from Chris:

Hi!

The Hadoop cluster has been a bit slow the past few days and I noticed that the bottleneck seems to be coming from the map tasks. We have separate map and reduce task capacities and it continuously looks like the mapper slots are all taken while there’s a surplus of open reduce slots. Is there any reason that we can’t open any of the free reduce slots to map tasks?

Regards,
Chris

Hadoop Playlist at Spotify

| Posted in Tips, Troubleshooting |

A typical day of a data engineer at Spotify revolves around Hadoop and music. However after some time of simultaneous developing MapReduce jobs, maintaining a large cluster and listening to perfect music for every moment, something surprising might happen…!

What?

Well, after some time, a data engineer starts discovering Hadoop (and its related concepts) in the lyrics of many popular songs. How can Coldplay, Black Eyed Peas, Michael Jackson or Justin Timberlake sing about Hadoop?

Maybe it is some kind of illness? Definitely! A doctor could call it “inlusio elephans” ;)

Mysterious Mass Murder In The Hadoopland

| Posted in Monitoring, Troubleshooting |

Mysterious Mass Murder

This is one of the most bloodcurling (and my favorites) stories, that we have recently seen in our 190-square-meter Hadoopland. In a nutshell, some jobs were surprisingly running extremely long, because thousands of their tasks were constantly being killed for some unknown reasons by someone (or something).

For example, a photo, taken by our detectives, shows a job running for 12hrs:20min that spawned around 13,000 tasks until that moment. However (only) 4,118 of map tasks had finished successfully, while 8,708 were killed (!) and … surprisingly only 1 task failed (?) – obviously spreading panic in the Hadoopland.