Master Data Management with webMethods OneData : webMethods OneData Consolidation MDM Guide : Matching Algorithms and Use Cases : Similarity Functions : Hybrid Similarity Functions
Hybrid Similarity Functions
Hybrid similarity functions are combinations of token-based and character-based similarity functions. The input strings are first split into multiple smaller tokens — usually words. For more information on token-based similarity functions that apply splitting of strings, seeSmith-Waterman and Smith-Waterman-Gotoh.
As a hybrid similarity, Monge-Elkan first tokenizes two strings s 1 and s 2 into tokens t i and t i, respectively. It then matches every token t i from s 1 with the token t i in s 2 that has the maximum similarity using the secondary similarity function TokenSim. The arithmetic mean over these maximum similarity scores gives the final similarity measure:
The Monge-Elkan similarity is not symmetric. The assumption that MongeElkan(s1,s2) = MongeElkan(s2,s1) is not always true.
Monge-Elkan Example
Let us consider the following input strings “Acme Soft” and “Acm Software.”
Applying the Monge-Elkan similarity function with Levenshtein as the secondary function:
Maximum similarity for “Acme”and ”Acm” is:1 - 1/4 = 75%
1 - 4/8 = 50%
Therefore, the Monge-Elkan similarity is computed as:
Copyright © 2011-2016 Software AG, Darmstadt, Germany.

Product LogoContact Support   |   Community   |   Feedback