afs_oai_id - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

The OAI ID and REC filters crawl an OAI-PMH archive.

The filter is declared with the afs_oai_id type. It is in the antidot-paf-misc package. It is a generator filter.

This filter will only work if instanced before the afs_oai_rec filter.

The OAI-PMH id filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

oaipmh_url

Yes

string

N/A

the OAI-PMH repository url

except_sets

No

list

N/A

List of Sets to be excluded from crawl

sets

No

list

N/A

List of Sets to explicitely crawl

output_layer

No

layer

CONTENTS

Layer filled for each output document

metadataPrefix

No

string

oai_dc

The metadataPrefix parameter for ListIdentifiers and ListRecords

max

No

integer

Infinite

The crawler will not produce more than max new documents. To be used for development purposes

from_date

No

string

Crawl all

Crawl only id from the given date. Date have to ISO8601. Granularity OAI concept is managed automatically by the filter. Empty string is accepted and is equivalent to parameter not present (allows environment variable which can be empty)

Note: When sets and except_sets are empty, the connector crawls the whole repository.
Crawling an OAI archive requires two connectors working together:
  • The identifiers crawling connector takes into account the given list of OAI sets to crawl, sorts identifier, and so on. It produces a clean list of OAI identifiers to be crawled.
  • The records crawling connector takes into account the set of identifiers produced by the identifiers crawling connector and crawls the meta-data information.
Note: The current connector only crawls the oai_dc metadata prefix.
Attention: This filter must be the first filter in a Pipe. This filter will never process input documents.