Using the Tamino Non-XML Indexer

After the steps listed in the section Setting Up the Tamino Non-XML Indexer have been successfully completed, the product is ready for use. You can now make meaningful searches on the content and/or metadata of legacy non-XML files.

The typical sequence of steps when using the Tamino Non-XML Indexer is as follows:

  1. Store a non-XML document (this explanation assumes that the document is a PDF file).

    The Tamino Non-XML Indexer intercepts the store operation; using the file's MIME type (application/pdf), the dispatcher activates the PDF plugin. The PDF plugin reads the PDF file and creates an XML shadow file containing:

    • metadata that describes the PDF file: date, title, author, creator and producer;

    • the text contents of the PDF file.

    The Tamino Non-XML Indexer passes the shadow file to Tamino, which stores it in the database.

    The intercepted original document (the PDF file in our example) is also passed back to Tamino, which stores it in the database, unless the option "storeShadowOnly" is activated.

  2. You can now issue queries, for example to retrieve PDF files with author="John Smith" or to retrieve PDF files that contain the text string "the importance of being earnest".

    Note that the type of queries that you can issue depends on the contents of the shadow file. See the section Mapped Properties or the documentation of user-written plugins for more details.