afs_datalake_diff - AFS - Reference Guides

AFS Filters Description

Reference Guides

The afs_datalake_diff can be used in a PaF that is part of a Data Lake in order to trigger processing of documents that have been altered since the previous run

The filter is declared with the afs_datalake_diff type. It is in the antidot-paf-misc package. It is a generator filter.

This filter can be instantiated only once at any given moment. It will not read the "instances" parameter in the configuration.

When several PaFs share an AIF Data Lake, a common pattern is to use PaFs that "fetch" data, and PaFs that "process" this data. In order to enable efficient incremental processing, it is necessary for "process" type PaFs to retrieve the list of documents that have been updated since their previous run. This filter, when placed at the beginning ot the first pipe of a PaF, automatically retrieves these documents. They will automatically be processed by the successors of this filter