Configuring Apache Hive Connections for Big Data
Prerequisites:
Apache Hive 0.14.0 is installed and configured to connect to the big data source.
You can use master data in OneData with big data sources such as Apache Hadoop. OneData establishes a JDBC connection to Apache Hive, which acts as the intermediate system that fetches data from the big data source. The master data in your big data source can then be sourced for your organization's data analytics. Currently, OneData supports Apache Hive as the connection through which you can connect to Apache Hadoop as your big data source.
Use this procedure to configure an Apache Hive connection.
To configure an Apache Hive connection
1. On the Menu toolbar, select Administer > System > Connection Manager.
2. In the Connection Type list, select JDBC.
3. Do one of the following:
Click
Add Connection to add a new connection.
Click the
Edit icon to edit an existing connection.
4. Configure the connection details, using the following table as a guide:
Property | Description |
Connection Name | Mandatory. Enter the unique connection name of 100 characters or less. The name can include spaces. |
Description | Optional. Description of how the connection is used. |
Connection Type | Displays the connection type selected in the previous screen. |
5. Configure the connection parameters, using the following table as a guide:
Property | Description |
Database | Mandatory. Select Hive. |
Database Version | Database version. Optional. |
Connection Type | Mandatory. Select Direct Connection. |
Application Server | Mandatory. Select Other. |
Application Server Version | Optional. Application server version. |
Connection String/ Data Source Name | Specify the connection string the following format: jdbc:hive2://Server Name:Port Number/ Example: jdbc:hive2://10.60.2.37:10000/ |
Driver Class | Enter the following driver class: org.apache.hive.jdbc.HiveDriver |
User-ID | Mandatory. User ID to connect to Hive. |
Password | Mandatory. Password of Hive user ID. |
Schema Name (If different from User-ID) | Not required. |
Target Server Name | Optional for information purposes only. Type the name of the target server. |
Associated Hook | Optional. The hook to be executed at connection logon. |
6. Click Save to save the new connection. Click Test Connection to verify the connection details.
Note: OneData automatically tests the connection when you save the connection details. You can also test the connection from the main Connection Manager screen.
7. In Hive, do the following:
a. Navigate to the location of the hive-site.xml file and add the following parameters:
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>false</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>10</value>
</property>
b. If you want to OneData to perform ACID (atomicity, consistency, isolation, durability) transactions (insert, update, and delete) on any table, set the property transactional=true on the particular table.
Example of Table Creation Script
create table student(id decimal(10,0), initials char(3), name varchar(100), valid boolean,
dob date, regndate timestamp, totalscore decimal(5,2) ) clustered by (id) into 2 buckets stored as orc
TBLPROPERTIES('transactional'='true');
Next Steps
Create a database update export job with the remote connection set to Apache Hive.
For details on how to create an export job, see Implementing webMethods OneData.