OneData 10.7 | Managing Master Data with webMethods OneData | webMethods OneData Consolidation MDM Guide | Introduction to OneData MDM | Data Consolidation | Key Data Consolidation Concepts
 
Key Data Consolidation Concepts
The key data consolidation concepts that you must familiarize yourself with are:
Data Quality Steward Group
Data Steward group is a usual OneData user group. Users of Data Steward group have the expertise to review data and take the required actions in the consolidation process through OneData user interface. The data stewards for the cleansing and matching processes can be different. If data cleansing is not satisfactory, users can configure the record to move to a Data Steward who can review the data, make modifications to it for a better cleansing result, send it again for cleansing or review, make changes, and even mark it as Cleansed. If the automated matching process cannot identify a particular record as a match or no match according to the configurations defined, the matching process marks the record to Queue for manual match. This allows the data stewards to analyze the score, hint, and pattern for matching, and then make a decision either to create a new gold record or link the record an existing Gold record. The Data Steward can see consolidation objects listed in the DQ inbox (data quality inbox) along with the record count awaiting their manual intervention.
Cleansers
Cleanser servers take uncleansed input and return cleansed, standardized, and complete details. Cleanser servers take uncleansed input for a data record and return cleansed, standardized, and complete details. Thus making the subsequent matching process optimal.
Software AG offers webMethods Locate that works with webMethods OneData for address cleansing and validation.
If you opt for webMethods Integration Server as the cleanser, then a service which cleanses the input must be first implemented in Integration Server. OneData offers the connectivity to the Integration Server service. Based on your business requirement, you must implement the logic for cleansing and standardization in the server.
The cleansing service is also responsible for generating Windowkeys, which help in filtering the number of match candidate records. This results in faster execution of the matching process in OneData. If you create a cleansing project of type webMethods Locate, it defines seven built-in windowkeys. The built-in windowkeys are defined considering the basic use cases. You can define more to suite your need.
Consolidation Gold object
The Consolidation Gold object is the table in which the Gold records resulting from the data consolidation process are maintained and then deployed to downstream systems.
OneData creates and maintains the Consolidation Gold object as follows:
1. The matching process considers records in the Cleansed status from Consolidation table and checks whether those records already exist in the Consolidation table. For the records that do not exist in the Consolidation Gold table, it creates the same.
2. If the matching process detects pre-exisiting records in the Consolidation Gold Table, OneData avoids creating duplicate records. Instead OneData checks for the survivorship rules and updates the relevant attributes of the Gold records from those records.
3. Based on whether a new Gold record is created and the process through which it was created, the status of the record in the Consolidation table is changed from Cleansed to Created Auto, Created Manual, Linked Auto, or Linked Manual.
For more details on the transition stages of a record, see Transition of a Record in the Data Consolidation Process.
DQ Inbox
The inbox for Data Steward users, where they see all the consolidation objects listed along with records for manual intervention.
Matching candidates
Matching occurs on a Consolidation object against a set of records from the total number of Consolidation Gold object records including the data records from the incoming load which have the same windowkey. The set of records OneData picks up for matching from the Consolidation Gold object is referred to as the match candidates. The concept of a match candidate is that not all records are required for matching from the Consolidation Gold object, where the record count could be in the millions. If all the Consolidation Gold object records were to be sent for matching, the matching process would require a lot of processing time. So, with the help of windowkeys, OneData narrows down to a set of match candidates. A broader windowkey increases the number of match candidates.
Rules
To determine if two records are duplicates, that is, they match each other, OneData uses match rules.
Before you create match rules, review and understand the data. More than one rule can be used in a given sequence to identify a perfect match on different attributes. Users can choose if the data must be evaluated against all the defined rules to identify if a match exist (Evaluate All Rules option) or stop execution of the pending rules' executions once a rule identifies a perfect match (Exit on first match option).
Single and multi table gold models
OneData supports single and multiple Gold models. The single Gold model contains only a single OneData object as the Gold. For example, in the person-address model, information about both the person (first name, last name, phone) and address information ( address, house number, postal code) reside in the same table. While creating the match rules, the match rules should be created under BOTH pattern, that is, person (master) and address (detail) information.
Multiple table gold model is used in cases such as, one customer has multiple addresses and hence, the customer-address relationship persists across two tables. If a single table gold model is used here, then the person details will have to be the same in two rows, where each row holds different addresses. In such instances, the multi table Gold model is preferred. The multi table Gold model requires two Gold data objects: one, holding the person details, and the other, holding address details. This model also requires a foreign key relation from the address detail to the person. In the earlier example of customer-address where a customer has multiple addresses, BOTH means the person (MASTER) and address (DETAIL) information of the incoming record matched with a corresponding gold record. This means that the matching process has found that the address already exists for the same customer and only survivorship rules need to be applied. MASTER means that only the person details match, which means that this is a new address for an existing person. In this case survivorship rules apply to the person record and a new address is created and associated with the person.
Windowkeys
In OneData, the matching process only picks up cleansed data because matching can never be accurate without standardized data. In the matching process, all Gold records are not considered match candidates. Instead, OneData, using the concept of a windowkey to narrow down to a set of records as the match candidates from all the Gold records available. A windowkey key comprises a combination of data from multiple fields in whole or in part.
The process of generating Windowkeys is specific to the cleanser or data quality cleansing server. A cleanser generates windowkeys, usually defined on cleansed attributes. Few data quality servers, like Trillium, offer windowkeys as output of the data cleansing process. For other cleansers, which do not accommodate the concept of windowkeys, OneData provides the flexibility to define windowkeys using regex expressions. OneData allows the use of multiple windowkeys. Better the windowkey precision, better the match candidate selection.
There is no hard and fast rule about which columns must be considered as windowkeys or how to generate windowkeys. These decisions completely depend on the data the model handles. The rule which you must keep in mind while generating the windowkeys is that a Gold record which could be the possible match for a consolidation object should not be missed as a match candidate because it does not satisfy the windowkey criteria. When the Gold record is not picked up as a possible match, duplicates could result in the Gold records.
Consolidation can deal with different domains: product, person, address, companies, and so on. If you are dealing with a company domain, the DUNS number, if available, can serve the purpose of the windowkey. The Data Universal Numbering System (DUNS) number can help the process fetch the required record from the Gold if it matches the consolidation record. But, if this information is not available in other attributes like website, the phone number can serve as windowkey. Additionally, we can create the windowkey using regex as the first two characters from website added to a few other chars from the company name.
In the case of a person, the social security number (SSN) could serve as the windowkey. In the absence of SSN, a combination of the first name and the last name, phone number, and address attributes can also help fetch the match candidates.
Example:
The consolidation model deals with a mobile phone connection provider. The consolidation Gold object would have the data of customers using their mobile connection. The attributes maintained in the model are: first name, last name, mobile phone number, house number, street, postal code, and country.
If we choose country as the windowkey attribute, it might fetch almost all the data from the consolidation Gold table as Country = ?. So one option is to choose phone number as the windowkey. A situation might arise when a person deactivates his present mobile phone number and starts using a new mobile phone number. Then, this will not bring a match from the Gold records because the phone numbers do not match. Now if it is the address attributes, we can take the house number, or street, or postal code as the windowkey. This will fetch a higher number of probable match candidates. You now have the option to combine in chunks and use the first few characters of street name, the house number, and the postal code. This will narrow down the match candidates to a reasonable set of match candidates. What if the customer changes their address? This will again pose a problem. OneData supports OR, AND operations in windowkey key usage. Define more than one windowkey and then use OR or AND operator to bring or narrow down all the possible match candidates.
Therefore, it is the data which determines how windowkeys must be generated. First, analyze the data and determine how you can limit the match candidates and which attributes would best help you do so.