OneData 10.7 | Managing Master Data with webMethods OneData | Administering webMethods OneData | Configuring Connections | Configuring OneData Database Connections | Configuring Apache Hive Connections for Big Data
 
Configuring Apache Hive Connections for Big Data
Prerequisites:
*Apache Hive 0.14.0 is installed and configured to connect to the big data source.
You can use master data in OneData with big data sources such as Apache Hadoop. OneData establishes a JDBC connection to Apache Hive, which acts as the intermediate system that fetches data from the big data source. The master data in your big data source can then be sourced for your organization's data analytics. Currently, OneData supports Apache Hive as the connection through which you can connect to Apache Hadoop as your big data source.
Use this procedure to configure an Apache Hive connection.
*To configure an Apache Hive connection
1. On the Menu toolbar, select Administer > System > Connection Manager.
2. In the Connection Type list, select JDBC.
3. Do one of the following:
*Click Add Connection to add a new connection.
*Click the Edit icon to edit an existing connection.
4. Configure the connection details, using the following table as a guide:
Property
Description
Connection Name
Mandatory. Enter the unique connection name of 100 characters or less. The name can include spaces.
Description
Optional. Description of how the connection is used.
Connection Type
Displays the connection type selected in the previous screen.
5. Configure the connection parameters, using the following table as a guide:
Property
Description
Database
Mandatory. Select Hive.
Database Version
Database version. Optional.
Connection Type
Mandatory. Select Direct Connection.
Application Server
Mandatory. Select Other.
Application Server Version
Optional. Application server version.
Connection String/ Data Source Name
Specify the connection string the following format:
jdbc:hive2://Server Name:Port Number/
Example:
jdbc:hive2://10.60.2.37:10000/
Driver Class
Enter the following driver class:
org.apache.hive.jdbc.HiveDriver
User-ID
Mandatory. User ID to connect to Hive.
Password
Mandatory. Password of Hive user ID.
Schema Name (If different from User-ID)
Not required.
Target Server Name
Optional for information purposes only. Type the name of the target server.
Associated Hook
Optional. The hook to be executed at connection logon.
6. Click Save to save the new connection. Click Test Connection to verify the connection details.
Note:OneData automatically tests the connection when you save the connection details. You can also test the connection from the main Connection Manager screen.
7. In Hive, do the following:
a. Navigate to the location of the hive-site.xml file and add the following parameters:
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>false</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>10</value>
</property>
b. If you want to OneData to perform ACID (atomicity, consistency, isolation, durability) transactions (insert, update, and delete) on any table, set the property transactional=true on the particular table.
For more details on the Hive parameters, see https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions.
Example of Table Creation Script
create table student(id decimal(10,0), initials char(3), name varchar(100), valid boolean,
dob date, regndate timestamp, totalscore decimal(5,2) ) clustered by (id) into 2 buckets stored as orc
TBLPROPERTIES('transactional'='true');
Next Steps
Create a database update export job with the remote connection set to Apache Hive.
For details on how to create an export job, see Implementing webMethods OneData.