Hakuna MapData! » oom killer
rss

Two memory-related issues on the Apache Hadoop cluster (memory swapping and the OOM killer)

| Posted in Monitoring, Troubleshooting |

0

In this blog post, I will describe two memory-related issues that we have recently experienced on our 190-node Apache Hadoop cluster at Spotify.

Hadoop Unreachable Nodes Jira Ticket

We have noticed that some nodes were suddenly marked dead by both NameNode and JobTracker. Although we could ping them, we were unable to ssh into them, what often suggests some really heavy load on these machines. When looking at Ganglia graphs, we have discovered that all nodes that were marked dead have one common issue – a heavy swapping (in case of Apache Hadoop, the practice shows that a heavy swapping of JVM process usually means some kind of unresponsiveness and/or even the “death”).

Servers swapping