Live Indexing - AIF - Technical Notes

AIF Live Indexing

Technical Notes
Target Audience

Situation: I have a corpus, such as a news feed, which is continuously updated (meaning new documents appear irregularly). I need a live Pipes and Filters process, that can discover new documents and automatically and immediately process them. The new documents can then be queried within 5 seconds after they appeared.

Unlike classical indexing, a live indexing PaF is always up. It watches a folder, and processes each new file or group of files appearing in this folder. Then it generates a reply database for those files, which is deployed to the Update Manager. Once the UM has synchronized reply servers, the new reply database can be immediately queried.

Process of a live indexing PaF:

  1. Once launched, PaF is idle, waiting for new documents to appear in the watched folder (based on document modification date).
  2. If a new, or several new documents appear, they are automatically and immediately sent to the second filter of the pipe.
  3. Documents are processed by the PaF the same way they are in a classical one.
  4. A reply database is generated and sent to the Update Manager, which synchronizes reply servers. At this time, documents appeared at step 2 can be queried.
  5. At established time, and if it is necessary, reply databases are defragmented in order to improve reply performance. Reply databases stored by the Update Manager are updated on the fly.
  6. When a new document appear in the watched folder, PaF go back to the first step.

The purpose of this Technical Note is to describe how to set up a live indexing PaF. As no special action is needed on reply side, only PaF side is described hereafter.

All necessary data for this use case can be find within AntidotForge, such as paf.xml, feed.xml, configuration and script examples, and this documentation.

See AntidotForge Live Indexing for more information.

AFS version must be greater than v7.4.3.1 to support Live Indexing.