Sunday, January 11, 2015

Apache Hive : Getting started with HiveQL

HiveQL is the Hive query language.it does not conform to ANSI SQL like any other databases query languages. Hive do not support row level insert,update and delete and also do not support the transactions.HiveQL supports creation, alteration of databases and do support the drop DDL too. create,alter and drop DDL can be applied to the other HIVE database objects e.g. Table, views, Indexes and function.
In this post we will try to run some of the DDL statement on the HIVE Database.

Create Database: (Click Image to Enlarge)



Managed Tables Vs External Tables

Hive controls the life cycle of Managed table's data,hive store the data of the managed table under the directory /user/hive/warehouse by default.as soon as we drop the manged table,hive deletes the data inside of the table. theoretically hive has ownership of the data in case of manged table.
Hive provides you the flexibility to the user to define an External table that points to the data but do not take the ownership of the data. its a handy way to share data among various tools to do analytic over it.the External table can be defined using the 'EXTERNAL' keyword and LOCATION keyword to locate the table.

Create Table : Managed Table (Click Image to Enlarge)




Create Table : External Table (Click Image to Enlarge)



Create Table : Partitioned Table (Click Image to Enlarge)
To tune up database query Hive uses the concept of Partitioning in which database is partitioned among multiple part horizontally. Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks so that user select predicate can only look into the target partition only.

No comments: