Posted in Programming | Posted on 30-07-2012|
I have already created a project called Pigitos which is a set of tiny, but highly useful Java UDFs for Apache Pig.
Currently, Pigitos contains a couple of UDFs that support working with maps. It provides UDFs to calculate the size of the map and get map’s keys (or values, or key/value pairs) as a bag. Such UDFs are very useful when working with dynamically created column qualifiers (that hold some meaningful information that you want to process) in Apache HBase tables.
It seems that there is no such UDFs in Apache Pig itself or Piggybank library. I have found only UDFs like
The ready-to-use jar can be downloaded from Pigitos github repo. At the time of writing this posts, it contains following UDFs:
MapSize– takes a map and returns the number of entries in the map MapKeysToBag– takes a map and produces a bag that contains all keys from that map MapValuesToBag-takes a map and produces a bag that contains all values from that map MapEntriesToBag– takes a map and produces a bag that contains tuples, where each tuple consists of two field: key and value (each tuple corresponds to one key/value pair from a map)
Here is a quick example:
User = LOAD 'hbase://user' USING HBaseStorage('friend:*', '-loadKey true') AS (username:chararray, friendMap:map); UserFriend = FOREACH User GENERATE username, FLATTEN(MapKeysToBag(friendsMap)) AS friendUsername;
Hope, you will find it useful!