Using Other Data Quality Servers
Customers can use data quality servers other than the recommended webMethods Locate for address cleansing and webMethods OneData Matching for matching as part of the Data Quality process in OneData. To do so, use the onedataDQPlugin.jar located in < Software AG_directory >OneData directory. The onedataDQPlugin.jar contains the interfaces and helper classes for implementing cleansing and matching services for the plugged-in data quality servers.
The following sections provide details about the classes included in the .jar.
Connectivity
The class used to execute the connectivity operations (attach and release) must implement the DataQualityConnector interface, for example:
package dataqualityserverinterfacetest;
import com.datafoundations.onedata.dataquality.DataQualityConfig;
import com.datafoundations.onedata.dataquality.plugin.DataQualityConnector;
import com.datafoundations.onedata.dataquality.plugin.helper.DataQuality
Connection;
public class DFConnector implements DataQualityConnector {
/**
* This method is used to connect to Plugged-In DQ server 'DF'
* from OneData.
* @param dataQualityconfig
* A DataQualityConfig object with parameters specific for connecting to
* the data quality server. Parameters defined in the Connectivity
* Information while creating the DQ server from Onedata can be
* retrieved using getters.
* @return DataQualityConnection object with connection to data
* quality server
*/
public DataQualityConnection connect(
final DataQualityConfig dataQualityconfig){
DataQualityConnection connection = new DataQualityConnection();
//initialize connection to DQ server,
//using setter add it to DataQualityConnection
return connection;
}
/**
* This method is used to release the data quality server connection
* from OneData
*
* @param dataQualityConnection
* A DataQualityConnection object containing connection to the plugged
* in DQ connection to the plugged in DQ server 'DF'.
*/
public void release(final DataQualityConnection dataQualityConnection) {
// Use the getter of DataQualityConnection to get the initialized
// connection to DQ server.
// Release the connection.
}
}
Cleansing
The class used to execute the cleansing operation must implement the DataQualityCleanser interface as in the following example.
package dataqualityserverinterfacetest;
import com.datafoundations.onedata.dataquality.DataQualityConstants;
import com.datafoundations.onedata.dataquality.plugin.DataQualityCleanser;
import com.datafoundations.onedata.dataquality.plugin.helper.DataQualityCleanserExecutorParams;
public class DFCleanser implements DataQualityCleanser {
/**
* This method is used to cleanse the data using the Plugged-In DQ server DF.
* @param executorParams
* A DataQualityCleanserExecutorParams object containing parameters specific to
* run the cleanse action through the data quality server.
* @param uncleansedData
* A string object array containing uncleansed data.
* @return A String object array containing cleansed data.
* @throws Exception if an error occurs in the cleansing process.
*/
public String[] doDataQualityCleansing(
final DataQualityCleanserExecutorParams executorParams
final String[] uncleansedData) throws Exception {
String[] cleansedData = null;
try {
final String[] cleanserOUTAttributes = (String[]) executorParams.get(DataQualityConstants.OUTPUT_ATTRIBUTE);
cleansedData = new String[cleanserOUTAttributes.length];
for (int count = 0; count < cleanserOUTAttributes.length; count++){
cleansedData[count] = "Cleansed_" + cleanserOUTAttributes[count];
//this can be modified to update the data from Plugged-In DQ server DF}
}
catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
return cleansedData;
}
/**
* This method is used to fetch the cleanser status of records.
* @param executorParams
* A DataQualityCleanserExecutorParams object containing
* parameters specific to running the cleanse action through the
* data quality server.
* @param cleansedData
* A string object array containing cleansed data.
* @return A DataQualityCleanserStatus object containing information about the data cleansing status.
*/
public DataQualityCleanserStatus getCleanserStatus(final DataQualityCleanserExecutorParams executorParams,
final String[] cleansedData) {
String datacleansedMessage = "Data Cleansed";
boolean isDataCleansed = true;
//------logic to add status to the user whehter the data is cleansed or not based on some output
//receiving from the cleanser server------
DataQualityCleanserStatus status = DataQualityCleanserStatus.getInstance(isDataCleansed, datacleansedMessage);
return status;
}
}
Matching
The class used to execute the matching operation must implement the DataQualityMatcher interface. The following mandatory columns must be returned through the matcher function:
DQ_MATCH_SCORE
DQ_MATCH_KEY
DQ_MATCH_PATTERN (For multiple Gold Model)
Optionally, return the column, DQ_MATCH_HINT.
The columns mentioned above are described in,
The Matcher Service in
Integration Server .
The following code sample is an example of a matcher service and includes the mandatory columns.
package dataqualityserverinterfacetest;
import com.datafoundat ions.onedata.dataquality.DataQualityConfig;
import com.datafoundations.one data.dataquality.plugin.DataQualityMatcher;
import com.datafoundations.onedata.dataqu ality.plugin.helper.DataQualityMatcher
ExecutorParams;
import java.util.ArrayList;
import java.util.List;
import nova.virtual.helper.param.DataColumn;
import nova.virtual.helper.param.DataRow;
import nova.virtual.helper.param.DataSet;
public class DFMatcher implements DataQualityMatcher {
public DataSet doDataQualityMatching (final DataRow dataRow,
final DataSetmatchWindowDataSet,
final DataQualityMatcher ExecutorParams executorParams)
throws Exception{
final DataSetmatchedWindowDataSet WithPattern =
DataSet.createStringType
DataSet();
//logic: create a copy of matchWindowDataSet and add matcher specific
//information to the rows
try {
final List matchWindowRows = matchWindowDataSet.getRows();
int iColCount = 0;
final int matchWindowRow Size = matchWindowRows.size();
for (int iMatchWindowRowCount = 0;
iMatchWindowRowCount < matchWindowRowSize;
iMatchWindowRowCount++)
{
final DataRow matchWindow Row = (DataRow) matchWindowRows
.get (iMatchWindowRowCount);
final DataRow matchedWindowRow = new DataRow();
final List matchWindowColumns = matchWindowRow.getColumnList();
final int matchWindowColumnSize = matchWindowColumns.size();
for (
iColCount = 0;
iColCount < matchWindowColumnSize;
iColCount++){
final DataColumn column = (DataColumn)
matchWindowColumns.get(iColCount);
}
matchedWindowRow.addColumn(column);
}
final List patternAddedRows = newArrayList();
final List matchedWindowRows = matchedWindowDataSetWithPattern.getRows();
final int matchedWindowRowSize = matchedWindowRows.size ();
for (
int iRowCount = 0;
iRowCount < matchedWindowRowSize;
iRowCoun t++)
{
final DataRow matchWindowRow =
(DataRow) matchedWindowRows.get(iRowCount);
//--------MANDATORY COLUMNS-------------------
final DataColumn scoreColumn = new DataColumn("DQ_MATCH_SCORE");
scoreColumn.setStringV alue("89");
matchWindowRow.addColumn(scoreColumn);
final DataColumn keyColumn = new DataColumn("DQ_MATCH_KEY");
keyColumn.setStringValue("10000");
matchWindowRow.addColumn(keyColumn);
//------MANDATORY COLUMNS (for multiple gold model)--------
final DataColumn patternColumn = new DataColumn("DQ_MATCH_PATTERN");
patternColumn.setStringValue("BOTH");
matchWindowRow.addColumn(patternColumn);
// --------------OPTIONAL COLUMNS -------------------------
final DataColumn hintColumn = new DataColumn("DQ_MATCH_HINT");
hintColumn.setStringValue("Match on address information.");
matchWindowRow.add Column(hint Column);
// OPTIONAL COLUMNS
patternAddedRows.add(iRowCount, matchWindowRow);}
matchedWindowDataSetWithPattern.setRows(patternAddedRows );
} catch (Exception e) {
throw new Exception(e.getMessage());
}
return matchedWindowData SetWithPattern;
}
public void initializeDataQualityServerConnection(
final DataQualityConfig dataQualityConfig)
throws Exception {
// if new connection need to be initialised for the matching process.
// Preferable only for small volume data
}
public void releaseDataQualityServerConnection()
throws Exception{
// releasing the new opened connection.
}
}
Notes
You should connect all the implemented classes for connectivity, cleansing, and matching to the corresponding Data Quality Server in OneData. You must manually update the column ACPT_VAL in the table OD_MD_MSTPROP for the record with PROP_ID value 30001 in the OneData metadata schema in the below mentioned format. This enables OneData data quality process to refer to these classes during the processing. The value in the ACPT_VAL column already contains information about existing data quality server and so, this information about the new data quality server is appended at the end. After you complete this change and restart OneData , the newly added data quality server start appearing in the list of available data quality servers in OneData user interface while creating a data quality cleansing or matching project.
<id>#<server name>#<full name of connector class>#< full name of cleanser class >
< full name of matcher class>;
For example:
1
#Trillium Software
#com.dfi.od.dataquality.impl.trillium.TrilliumConnector
#com.dfi.od.dataquality.impl.trillium.TrilliumCleanser
#com.dfi.od.dataquality.impl.trillium.TrilliumMatcher;
100
#webMethods#com.dfi.od.dataquality.impl.integrationserver.ISConnector
#com.dfi.od.dataquality.impl.integrationserver.ISCleanser
#com.dfi.od.dataquality.impl.integrationserver.ISMatcher;
300
#DF DQ Server
#dataqualityserverinterfacetest.DFConnector
#dataqualityserverinterfacetest.DFCleanser
#dataqualityserverinterfacetest.DFMatcher;