Configuring Apache Hive Connections for Big Data

Prerequisites:

Apache Hive 0.14.0 is installed and configured to connect to the big data source.

You can use master data in OneData with big data sources such as Apache Hadoop. OneData establishes a JDBC connection to Apache Hive, which acts as the intermediate system that fetches data from the big data source. The master data in your big data source can then be sourced for your organization's data analytics. Currently, OneData supports Apache Hive as the connection through which you can connect to Apache Hadoop as your big data source.

Use this procedure to configure an Apache Hive connection.

To configure an Apache Hive connection

1. On the Menu toolbar, select Administer > System > Connection Manager.

2. In the Connection Type list, select JDBC.

3. Do one of the following:

Click Add Connection to add a new connection.

Click the Edit icon to edit an existing connection.

4. Configure the connection details, using the following table as a guide:

Property	Description
Connection Name	Mandatory. Enter the unique connection name of 100 characters or less. The name can include spaces.
Description	Optional. Description of how the connection is used.
Connection Type	Displays the connection type selected in the previous screen.

5. Configure the connection parameters, using the following table as a guide:

Property	Description
Database	Mandatory. Select Hive.
Database Version	Database version. Optional.
Connection Type	Mandatory. Select Direct Connection.
Application Server	Mandatory. Select Other.
Application Server Version	Optional. Application server version.
Connection String/ Data Source Name	Specify the connection string the following format: jdbc:hive2://Server Name:Port Number/ Example: jdbc:hive2://10.60.2.37:10000/
Driver Class	Enter the following driver class: org.apache.hive.jdbc.HiveDriver
User-ID	Mandatory. User ID to connect to Hive.
Password	Mandatory. Password of Hive user ID.
Schema Name (If different from User-ID)	Not required.
Target Server Name	Optional for information purposes only. Type the name of the target server.
Associated Hook	Optional. The hook to be executed at connection logon.

6. Click Save to save the new connection. Click Test Connection to verify the connection details.

Note:OneData automatically tests the connection when you save the connection details. You can also test the connection from the main Connection Manager screen.

7. In Hive, do the following:

a. Navigate to the location of the hive-site.xml file and add the following parameters:

<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>false</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>10</value>
</property>

b. If you want to OneData to perform ACID (atomicity, consistency, isolation, durability) transactions (insert, update, and delete) on any table, set the property transactional=true on the particular table.

For more details on the Hive parameters, see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions.

Example of Table Creation Script

create table student(id decimal(10,0), initials char(3), name varchar(100), valid boolean,
dob date, regndate timestamp, totalscore decimal(5,2) ) clustered by (id) into 2 buckets stored as orc
TBLPROPERTIES('transactional'='true');

Next Steps

Create a database update export job with the remote connection set to Apache Hive.

For details on how to create an export job, see Implementing webMethods OneData.