Situation: I have a corpus, such as a news feed, which is continuously updated (meaning new documents appear irregularly). I need a live Pipes and Filters process, that can discover new documents and automatically and immediately process them. The new documents can then be queried within 5 seconds after they appeared.
Unlike classical indexing, a live indexing PaF is always up. It watches a folder, and processes each new file or group of files appearing in this folder. Then it generates a reply database for those files, which is deployed to the Update Manager. Once the UM has synchronized reply servers, the new reply database can be immediately queried.
Process of a live indexing PaF: