Hakuna MapData! » Blog Archive » A couple of basic, but useful tricks when working with Apache HBase shell
rss

A couple of basic, but useful tricks when working with Apache HBase shell

| Posted in Programming |

I would like to share some basic tricks to use with Apache HBase shell that I have learned by reading HBase: The Definitive Guide by L.George, HBase in Action by N.Dimiduk and A.Khurana and taking part in Cloudera Training for Apache HBase.

In this post, I will create HBase table, populate it with sample data and scan it. Each step will demonstrate a different technique to achieve the goal.

Create the User table

You can pipe commands to the hbase shell and easily create the table using one single command (without the need to explicitely launch the HBase shell first).

$ echo "create 'user', 'info'" | hbase shell
 
create 'user', 'info'
0 row(s) in 1.7610 seconds

Populate the User table with sample data

The HBase shell is JRuby IRB (the JRuby implementation of the Interactive Ruby Shell) with some HBase-specific commands added. It means that you are allowed to mix JRuby with HBase commands. This feature can be used to populate the user table with some sample data:

$ hbase shell
 
hbase(main):001:0> for i in '0'..'4' do \
hbase(main):002:1* put "user", "user_#{i}", "info:email", "user_#{i}@hakunamapdata.com" \
hbase(main):003:1* end
0 row(s) in 0.5520 seconds
 
0 row(s) in 0.0020 seconds
 
0 row(s) in 0.0030 seconds
 
0 row(s) in 0.0040 seconds
 
0 row(s) in 0.0030 seconds
 
=> "0".."4"

Make sure, that you have used double quotes while evaluating i variable (i.e. user_#{i}). Otherwise, you will end with one single row:

hbase(main):008:0> scan 'user'
ROW             COLUMN+CELL                                                                                               
 user_#{i}      column=info:email, timestamp=1344763771988, value=user_#{i}@hakunamapdata.com

Scanning the User table

You can pass scripts to the HBase Shell in a following way:

hbase shell PATH_TO_SCRIPT

Let’s create a simple script to perform a partial scan of the user table…

$ cat hbase_user_part_scan.txt
 
scan 'user', {STARTROW => 'user_1', STOPROW => 'user_3'}
exit

… and run it.

$ hbase shell hbase_user_part_scan.txt 
 
ROW             COLUMN+CELL                                                                                               
 user_1         column=info:email, timestamp=1344763701019, value=user_1@hakunamapdata.com                                
 user_2         column=info:email, timestamp=1344763701023, value=user_2@hakunamapdata.com                                
2 row(s) in 0.6110 seconds

It works as expected, but it is definitiely not written in a configurable way. Alternatively, you can create generic bash script that provides the same functionality, but allows you to conveniently pass a few important parameters.

$ cat part_scan.sh 
 
#!/bin/bash
 
TABLE=$1
STARTROW=$2
STOPROW=$3
exec hbase shell <<EOF
     scan "${TABLE}", {STARTROW => "${STARTROW}", STOPROW => "${STOPROW}"}
EOF

Run this script with the table name (user) and start and stop row keys (user_1 and user_3 respectively).

$ ./part_scan.sh user user_1 user_3
 
     scan "user", {STARTROW => "user_1", STOPROW => "user_3"}
ROW             COLUMN+CELL                                                                                               
 user_1         column=info:email, timestamp=1344763701019, value=user_1@hakunamapdata.com                                
 user_2         column=info:email, timestamp=1344763701023, value=user_2@hakunamapdata.com                                
2 row(s) in 0.6170 seconds

You must run this script with at least one parameter (i.e. table name). Actually, two other parameters are optional. If you omit third parameter you will run a partial scan without STOPROW. Similarly, if you omit second and third parameters, you will run a full scan of the table.

$ ./part_scan.sh user user_1
 
	scan "user", {STARTROW => "user_1", STOPROW => ""}
ROW             COLUMN+CELL                                                                                               
 user_1         column=info:email, timestamp=1344763701019, value=user_1@hakunamapdata.com                                
 user_2         column=info:email, timestamp=1344763701023, value=user_2@hakunamapdata.com                                
 user_3         column=info:email, timestamp=1344763701026, value=user_3@hakunamapdata.com                                
 user_4         column=info:email, timestamp=1344763701030, value=user_4@hakunamapdata.com                                
4 row(s) in 0.6610 seconds

Other tricks

Discover when the cell was added to the table

hbase(main):004:0> get 'user', 'user_1'
COLUMN            CELL                                                                                                      
 info:email       timestamp=1344763701019, value=user_1@hakunamapdata.com                                                   
1 row(s) in 0.6400 seconds

Time.at(time) provide more human-readable information about the date when the cell was added to the table e.g.

hbase(main):005:0> Time.at(1344763701019/1000)
=> Sun Aug 12 11:28:21 +0200 2012
VN:F [1.9.20_1166]

Rate this post!

Rating: 3.5/5 (2 votes cast)
A couple of basic, but useful tricks when working with Apache HBase shell, 3.5 out of 5 based on 2 ratings

Comments

comments