Blog::: JvmNotFoundException: Apache Hive : Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval.an index is a pointer to data in a table.the Indexes are implemented by using various data structure e.g. BTree, BitMap etc.Hive has limited indexing capabilities. you can speed up some of the data retrival operation using the indexes, the index data is stored in the another table in the Hive

The other way of speed up your query other than the Indexing are partitioning and bucketing which we will cover in coming posts.
Indexing is also a good alternative to partitioning when the logical partitions would.Indexes in Hive, like those in relational databases, need to be evaluated carefully.

Maintaining an index requires extra disk space and building an index has a processing cost.

Creating an Index

hive> create index customer_index on table customers(custid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    > with deferred rebuild
    > in table customer_index_table
    > COMMENT 'Customer Index on custid';
OK
Time taken: 5.713 seconds

Creating an bitmap Index

hive> create index customer_index_bitmap on table customers(custid)
    > as 'BITMAP'                                                  
    > with deferred rebuild                                        
    > COMMENT 'Index on customer table on custid using bitmap';    
OK
Time taken: 0.603 seconds

View an Index

hive> show formatted index on customers;                       
OK
idx_name            	tab_name            	col_names           	idx_tab_name        	idx_type            	comment             
	 	 	 	 	 
	 	 	 	 	 
customer_index      	customers           	custid              	customer_index_table	compact             	Customer Index on custid
customer_index_bitmap	customers           	custid              	orderdb__customers_customer_index_bitmap__	bitmap              	Index on customer table on custid using bitmap
Time taken: 0.988 seconds, Fetched: 5 row(s)

Rebuild an Index

hive> alter index customer_index on customers rebuild;      
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201501241609_0022, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0022
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job  -kill job_201501241609_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-01-24 18:32:39,926 Stage-1 map = 0%,  reduce = 0%
2015-01-24 18:32:41,975 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:42,978 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:43,982 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:45,011 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:46,071 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:47,078 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:48,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:49,159 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:50,245 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.81 sec
2015-01-24 18:32:51,305 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.24 sec
2015-01-24 18:32:52,309 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.24 sec
2015-01-24 18:32:53,385 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.24 sec
MapReduce Total cumulative CPU time: 2 seconds 240 msec
Ended Job = job_201501241609_0022
Loading data to table orderdb.customer_index_table
Deleted hdfs://RKS-PC:54310/user/hive/warehouse/orderdb.db/customer_index_table
Table orderdb.customer_index_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 8567, raw_data_size: 0]
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 2.24 sec   HDFS Read: 8569 HDFS Write: 8567 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 240 msec
OK
Time taken: 42.379 seconds

Drop an Index

hive> drop index if exists customer_index on customers;
OK
Time taken: 3.267 seconds

Blog::: JvmNotFoundException

Saturday, January 24, 2015

Apache Hive : Indexes

No comments: