afs_regex_extract - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

afs_regex_extract filter extracts data using regular expression language.

The filter is declared with the afs_regex_extract type. It is in the antidot-paf-misc package. It is a processor filter.

This filter will only work if instanced after the None filter.

This filter will only work if instanced before the None filter.

This filter can be instantiated only once at any given moment. It will not read the "instances" parameter in the configuration.

The Regex Extract filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

output_layer

No

layer

CONTENTS

The layer where to expose the fetched data

input_layer

No

layer

CONTENTS

The layer from wich the data is read

regex

Yes

map

N/A

Keys are names given to rules. Values are regular expression with group(s) (for example brackets). Several keys with the same name are allowed (they will be merged in the output feed)

output_format

No

string

XML

The serialized format in the OUTPUT layer. Values can be XML, JSON or SERIALIZED_PROTOBUF.

The fetched values are produced in an XML format feed. Here an example for a configuration type "hyperlink" <tt><a href="([.:/a-z]+)">(w+)</a></tt>: <pre><afs:Results xmlns:afs="http://ref.antidot.net/v7/afs#"> <afs:result name="liens"> <afs:occurrence> <afs:groups index="0" match="./toto.fr"/> <afs:groups index="1" match="liens"/> </afs:occurrence> <afs:occurrence> <afs:groups index="0" match="http://wikipedia.fr"/> <afs:groups index="1" match="Wikipedia"/> </afs:occurrence> </afs:result> </afs:Results></pre>

Attention: The regular expression engine relies on the ICU. The syntax and limitations of the engine must be followed, as documented at http://userguide.icu-project.org/strings/regexp. Is it advised to check regex with the ICU Regular Expression Demonstration (https://ssl.icu-project.org/icu-bin/redemo) before using it with AFS.