Posted in Community, Presentations, Troubleshooting | Posted on 27-02-2014|
I am extremely happy to say that my proposal was accepted for Hadoop Summit 2014 in Amsterdam ;) The title of my presentation is Hadoop operations powered by … Hadoop and I will talk about various metrics, logs and files that Hadoop generates and how to analyze them … using Hadoop (and open-source tools and simple scripts) to learn more about Hadoop and avoid guesstimates!
Posted in Programming, Testing, Troubleshooting | Posted on 22-12-2013|
Recently, I have been refactoring one of our Hive scripts. Because, I introduced significant changes, a query is resource-intensive (process almost two terabytes of data) and … I wanted to iterate fast, I decided to test it locally.
To make it easier, I implemented Beetest – a super simple utility that helps you to test your Apache Hive scripts locally without any Java knowledge.
Posted in Failures, Troubleshooting | Posted on 21-12-2013|
At Spotify, we have a company-wide culture of celebrating successes and … failures. Because we want to iterate fast, we do realize that failures can happen. On the other hand, we can not afford to make the same mistake more than once. One way of preventing from that is sharing our failures, mistakes and learning across the company.
Today however, I would like to share my failures … outside of the company ;) While my failures relate to my recent work with Apache Hadoop cluster, I think that the lessons that I have learned are generic enough, so that many people can benefit from them.
Posted in Uncategorized | Posted on 03-12-2013|
A presentation that I gave at at Distributed Systems Seminar at the University of Warsaw (the university that I graduated from). I wanted to make this presentation academically interesting, but also shows a bit how everything looks in practice on a large Hadoop cluster at Spotify. I hope you will like this combination! ;)