Introduction

The Tamino Non-XML Indexer seamlessly integrates non-XML files, for example Star Office documents and Microsoft Office documents, into your Tamino environment. You can now make meaningful searches on the content and/or metadata of legacy non-XML files. The metadata can typically include information such as the date when the document was last changed, the author, etc. Note that the amount and type of metadata depends on the application program that created the file. Older versions of software, for example Microsoft Word Version 2.0, often generate little or no metadata.

When a non-XML file is processed (stored or updated) in a Tamino database collection in which the Tamino Non-XML Indexer is active, Tamino stores two objects:

  • the non-XML file itself;

  • a so-called shadow file, which is indexed XML data comprising:

    • the raw data contained in the file, for example the plain text in a Microsoft Word file;

    • metadata extracted from the file.

Note that it is possible to suppress the storing of the non-XML file; this is meaningful if, for example, it is already stored elsewhere. In order to do this, use the element tsd:storeShadowOnly in the document's schema. When this option is active, a pseudo non-XML file which is a BLOB of size zero is stored.

The Tamino Non-XML Indexer processes each document based on its MIME type; this information is submitted along with the document when the document is stored in Tamino. The list of MIME types that are supported by the Tamino Non-XML Indexer "out of the box" is in the document Supported MIME Types. You can add support for further MIME types by following the instructions in the section Adding Support for Further MIME Types.The following is an informal, incomplete list of applications that produce documents that can be processed by the Tamino Non-XML Indexer. Some MIME types, for example "text/rtf", are generic, i.e. files with these MIME types can be produced by many different applications, including freeware and shareware programs.

  • Microsoft Office files:

    • Microsoft Word

    • Microsoft Excel

  • OpenOffice files:

    • OpenOffice Writer

    • OpenOffice Calc

  • StarOffice files:

    • StarOffice Writer

    • StarOffice Calc

  • Adobe PDF files

  • Plain text files (UTF-8)

  • Plain text files

  • MPEG audio files (often known as MP3 files)

  • RTF (Rich Text Format) files

  • Zip files