afs_rss_crawl - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

The rss crawler enables crawling RSS feeds. It auto-detects the type of syndication feed (RSS1.0, RSS 2.0 or Atom) with an XML analysis of the feed. Then it cuts the feed for each entry and creates a new PaF document.

The filter is declared with the afs_rss_crawl type. It is in the antidot-paf-misc package. It is a generator filter.

The RSS and Atom feed Crawl filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

urls

Yes

map

N/A

List of key, value pairs where key is an url to crawl, and value an output directory for persistent storage

output_layer

No

layer

CONTENTS

Layer filled for each output document

user_agent

No

string

Python-urllib/3.2

Controls the user-agent provided in HTTP request

Attention: This filter must be the first filter in a Pipe. This filter will never process input documents.