tf:phonetic

Search text based on phonetic similarities.

Top of page

Related Syntax Constructs

The following construct(s) refer to this construct:


Syntax

tf:phonetic(string $searchString) => unspecified

Description

The function tf:phonetic is specific to Tamino. It takes a search string as argument and returns all strings that are "phonetically equivalent". It can only be used within the scope of the following functions:

tf:containsAdjacentText
tf:containsNearText
tf:containsText
tf:createAdjacentTextReference
tf:createNearTextReference
tf:createTextReference

Tamino performs this search according to a set of rules that is modeled after the widely known Soundex algorithm. It is based on the pronunciation of the English language, but includes also checks for character combinations that occur in German. This means that the accuracy of the algorithm is highest for English and German, but it can also be used for other languages. However, it is not exact: Sometimes it will fail to identify words that are homophones, and sometimes the algorithm will incorrectly detect a match when in fact the pronunciation of the word is quite distinct. The algorithm works by reducing letters or combinations of letters to their phonetic equivalents according to the following rules:

Letters Phonetic Equivalent
A, E, I, O, U, Y (initial position) A
P, B B
F, V, W, (P + H) F
G, K, Q, (C + [A, E, H, I, J, K, L, O, Q, R, U, X, Y]) G
L L
M M
N N
R R
C, S, Z, (D + [C, S, Z]), (X + [C, K, Q]), (T + [C, S, Z]), (S+C), (Z+C) S
D, T D
(G + G + S) GS
(X - [C, K, Q], *) GS
H (ignored)

Here, "+" denotes two letters appearing in the order shown. "[…]" denotes alternative letters, "-" denotes exclusion, i.e., two letters appearing together of which the second letter is not one of the letters listed. Finally, "*" denotes any letter.

So "(X + [C, K, Q])" means a sequence of letters consisting of "X" followed by one of the letters "C", "K" or "Q", whereas "(X-[C, K, Q], *)" means a sequence of three letters consisting of "X" followed by any letter other than "C" or "K" or "Q", followed by any letter.

More elaborated rules take precedence over simple rules: For example, if a word contains the adjacent letters "P" and "H", the rule reducing the combination of "(P + H)" to "F" has precedence over the two simple rules that reduce "P" to "B" and ignore "H".

Example: "PHONETIC" is interpreted as "(P + H), (O), (N), (E), (T), (I), (C)" and reduced to "FNDS".

Note:
The value of the server parameter "markup as delimiter" is respected when determining the word tokens. See the documentation of the Tamino Manager for details.

Arguments

$searchString

string value

Example

  • Retrieve the names of all patients whose surname sound like "Meier".

    for $a in input()/patient
    where tf:containsText($a/name/surname, tf:phonetic("Meier"))
    return $a/name

    This query effectively retrieves all patient names that are written as "Meier", "Maier", "Mayer", or "Meyer" as they all sound alike.