Recently, I have been refactoring one of our Hive scripts. Because, I introduced significant changes, a query is resource-intensive (process almost two terabytes of data) and … I wanted to iterate fast, I decided to test it locally.
To make it easier, I implemented Beetest – a super simple utility that helps you to test your Apache Hive scripts locally without any Java knowledge.
At Spotify, we have a company-wide culture of celebrating successes and … failures. Because we want to iterate fast, we do realize that failures can happen. On the other hand, we can not afford to make the same mistake more than once. One way of preventing from that is sharing our failures, mistakes and learning across the company.
Today however, I would like to share my failures … outside of the company ;) While my failures relate to my recent work with Apache Hadoop cluster, I think that the lessons that I have learned are generic enough, so that many people can benefit from them.
Posted in Uncategorized | Posted on 03-12-2013
A presentation that I gave at at Distributed Systems Seminar at the University of Warsaw (the university that I graduated from). I wanted to make this presentation academically interesting, but also shows a bit how everything looks in practice on a large Hadoop cluster at Spotify. I hope you will like this combination! ;)