Hakuna MapData! » Blog Archive » Pigitos – MapKeysToBag, MapSize and more UDFs to manipulate maps in Apache Pig
rss

Pigitos – MapKeysToBag, MapSize and more UDFs to manipulate maps in Apache Pig

| Posted in Programming |

I have already created a project called Pigitos which is a set of tiny, but highly useful Java UDFs for Apache Pig.

Currently, Pigitos contains a couple of UDFs that support working with maps. It provides UDFs to calculate the size of the map and get map’s keys (or values, or key/value pairs) as a bag. Such UDFs are very useful when working with dynamically created column qualifiers (that hold some meaningful information that you want to process) in Apache HBase tables.

It seems that there is no such UDFs in Apache Pig itself or Piggybank library. I have found only UDFs like TOBAG or TOTUPLE, but they do not take a map as an input parameter.

The ready-to-use jar can be downloaded from Pigitos github repo. At the time of writing this posts, it contains following UDFs:

  • MapSize – takes a map and returns the number of entries in the map
  • MapKeysToBag – takes a map and produces a bag that contains all keys from that map
  • MapValuesToBag -takes a map and produces a bag that contains all values from that map
  • MapEntriesToBag – takes a map and produces a bag that contains tuples, where each tuple consists of two field: key and value (each tuple corresponds to one key/value pair from a map)

Here is a quick example:

User = LOAD 'hbase://user' USING HBaseStorage('friend:*', '-loadKey true') 
  AS (username:chararray, friendMap:map[]);
UserFriend = FOREACH User
  GENERATE username, FLATTEN(MapKeysToBag(friendsMap)) AS friendUsername;

Hope, you will find it useful!

VN:F [1.9.20_1166]

Rate this post!

Rating: 0.0/5 (0 votes cast)

Comments

comments