Hakuna MapData! » udf
rss

Pigitos in action – Reading HBase column family content in a real-world application

| Posted in Programming |

0

In this post I will demonstrate how you can use Pigitos (a library that contains tiny, but highly useful UDFs for Apache Pig) to implement “friends you may have” feature (inspired by “people you may know”) for a simple real-world application.

Problem definition

Assume that we are launching a social website called “CloseFriendBook” and we have to design a basic HBase table that stores information about the users and their close friends.

Our access pattern is either:

  • read a user profile (information like first name, email, age), or
  • read the full list of friends of a given user (theoretically, a user may have unlimited number of close friends, but in reality it has no more than tens or hundreds of them).

Pigitos – MapKeysToBag, MapSize and more UDFs to manipulate maps in Apache Pig

| Posted in Programming |

0

I have already created a project called Pigitos which is a set of tiny, but highly useful Java UDFs for Apache Pig.

Currently, Pigitos contains a couple of UDFs that support working with maps. It provides UDFs to calculate the size of the map and get map’s keys (or values, or key/value pairs) as a bag. Such UDFs are very useful when working with dynamically created column qualifiers (that hold some meaningful information that you want to process) in Apache HBase tables.