tf:stem

Search text based on word stems.

Top of page

Related Syntax Constructs

The following construct(s) refer to this construct:


Syntax

tf:stem(string $searchString) => unspecified

Description

The function tf:stem is specific to Tamino. It takes a search string as argument and returns all strings that share the same stem as the search string. It can only be used within the scope of the following functions:

tf:containsAdjacentText
tf:containsNearText
tf:containsText
tf:createAdjacentTextReference
tf:createNearTextReference
tf:createTextReference

Determining the word tokens that have the same word stem as the search string requires language-specific information. Currently, the pre-defined stemming information is only suitable for German.

Notes:

  1. For better performance, Tamino uses a special stemming index that must be activated for the current database. Therefore, you must set the database server parameter option "stemming index" to "yes". See the documentation of the Tamino Manager for details.
  2. The value of the server parameter "markup as delimiter" is respected when determining the word tokens. See the documentation of the Tamino Manager for details.

Argument

$searchString

a string value

Example

  • In the paragraphs of some chapter, retrieve all occurrences of the German word "Bank" in the sense of a bank dealing with money:

    let $text :=
     <chapter>
      <para>Die Bank eröffnete drei neue Filialen im Verlauf der letzten fünf Jahre.</para>
      <para>Ermüdet von dem Spaziergang setzte sich die alte Dame erleichtert auf die gepflegt
        wirkende Bank mitten im Stadtpark.</para>
      <para>Die aktuelle Bilanz der Bank zeigt einen Anstieg der liquiden Mittel im Vergleich
        zum Vorjahresquartal.</para>
      </chapter>
    for    $a in $text//para
    let    $check :=
           for    $value in ("Geld", "Bilanz", "Filiale", "monetär", "Aktie")
           return tf:containsNearText($a, 10, tf:stem($value), tf:stem("Bank"))
    where  count($check[. eq true()]) > 0
    return $a

    A sequence creates a word family that is valid for one of two readings of the German word "bank". For each of these related words it is checked whether the current paragraph contains an inflected form that is no longer than ten unmatched word tokens apart from an inflected form of the word "bank". The second let clause returns a sequence of five Boolean values. If at least one of them is true—expressed by the where clause—the corresponding para element is returned as part of the result.