afs_layer_annotate - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

The afs_layer_annotate filter enriches a document.

The filter is declared with the afs_layer_annotate type. It is in the antidot-paf-misc package. It is a generator filter.

The Layer annotate filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

db_dir

Yes

directory

N/A

Localized annotate databases to load. One subdirectory per lang, containing an XML file. Each file describes a set of concepts and labels associated to these concepts. When a localized document is annotated, only the corresponding localized annotate database will be used.

output_layer

Yes

layer

N/A

A layer of the document where the results are stored

input_layer

No

layer

CONTENTS

Input documents layer (must be plain text)

output_format

No

string

XML

The serialized format in the OUTPUT layer. Values can be XML or JSON.

The Layer annotate filter deprecated specifications are described in the following table:

Parameter name

Deprecated since

Replaced by

Description

db_path

7.7

db_dir

An XML file describing a set of concepts and labels associated to these concepts.

The same way that the Concept agent tags a query containing some labels with the corresponding URI, afs_layer_annotate filter tags a document. For each labels of the input XML file, afs_layer_annotate will add the corresponding URI in the layer output_layer if found in the layer input_layer. afs_layer_annotate can be configured to be case, accents, and / or inflection insensitive (only supports AFS dictionaries for inflection, no user dictionary allowed).

PaF configuration example:
<afs:filter uri="#expand" type="afs_layer_annotate" instances="1">
  <afs:args>
    <afs:arg name="db_path" value="$AFS7/toto.xml"/>
    <afs:arg name="input_layer" value="USER_1"/>
    <afs:arg name="output_layer" value="USER_2"/>
    <afs:arg name="output_format" value="XML"/>
  </afs:args>
</afs:filter>

XML input file example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<afs:annotate xmlns:afs="http://ref.antidot.net/v7/afs#">

  <afs:normalize xml:lang="fr"
                 flexions="false"
                 lowercase="false"
                 removeAccents="false"/>

  <afs:annotations label="Sizes">

    <afs:annotation uri="#large">
      <afs:labels xml:lang="fr">
        <afs:label>X</afs:label>
      </afs:labels>
    </afs:annotation>

    <afs:annotation uri="#medium">
      <afs:labels xml:lang="fr">
        <afs:label>M</afs:label>
      </afs:labels>
    </afs:annotation>

  </afs:annotations>

</afs:annotate>

XML output example in the layer USER_2:
<?xml version="1.0" standalone="yes"?>
<afs:Expand xmlns:afs="http://ref.antidot.net/v7/afs#">
  <afs:match uri="#large" matched="X" offset="42" size="1"/>
  <afs:match uri="#medium" matched="M" offset="69" size="1"/>
  <afs:match uri="#medium" matched="M" offset="666" size="1"/>
</afs:Expand>

Tip: offset and size are expressed in UTF-8 characters number. They can find the original string that matched.

Tip: The attribute src is the original string that matched.

Tip: The attribute matched gives the standard form of what matched, according to the normalization defined in the XML file afs_annotate. As with the Concept agent, text and labels are standardized and it is these standardized forms that are being studied to find a match.

More details about db_dir parameter: afs_layer_annotate filter uses db_dir parameter to locate localized annotation databases to load. The targeted directory should contains one subdirectory per language (fr, en, de, ...) Old parameter db_path is deprecated since AFS v7.6.2 and replaced by db_dir, then:
  • If db_path is set and db_dir not, the annotation database located in db_path directory will be used for all languages.
  • In an annotation database, xml:lang attributes are still accepted but will not be used.

Example with db_dir set to $AFS7/annotate:
$AFS7/annotate
├── en-GB
│   └── geonames.xml
├── fr
│   └── pactols-fr.xml
└── fr-FR
    └── pactols-fr-fr.xml

When a document is processed by afs_layer_annotate, the good annotation database is used according to the document language. Document language can be either set by afs_doc_index, or forced by afs_lang_set. The more precise annotation database is used. With previous example, the behavior will be as follows:
  • When an fr document is processed, fr annotation database is used.
  • When an fr-FR document is processed, fr-FR annotation database is used.
  • When an fr-BE document is processed, fr annotation database is used.
  • When an en document is processed, it is set to FAILED.
  • When an en-US document is processed, it is set to FAILED.
If stemming is activated, then the bundled Antidot stemming dictionary of the corresponding language is loaded.