Feed Language Configuration - AIF

AIF Language Management

Product
AIF
AFS_Version
7.7
Category
Technical Notes

The afs:lang element allows to define the language of every indexed document. It has one of the following attributes:

  • value
  • XPath
  • auto

The value attribute can take a language in ISO 639-1 format, possibly followed by a dash and a region code in ISO 3166-1 format. In this case, all indexed documents will have this language as associated language, association never fails.

Example

<?xml version="1.0"?>
<afs:xmlFeed xmlns="http://ref.antidot.net/v7/afs#">
<afs:lang value="fr-BE"/>
....
</afs:xmlFeed>

The XPath or JSONPath attribute can take respectively an XPath or a JSONPath containing the language of the indexed document. This XPath or JSONPath is resolved for every document in order to associate a language to the document.

The following language format are recognized:

  • ISO 639-1 possibly followed by ISO 3166-1 (preferred syntax (IETF BCP 47): en-GB, alternate syntaxes: en_GB, enGB),
  • ISO 639-2,
  • ISO 639-3,
  • the language name expressed in english,
  • the language name expressed in the language itself.

For example, the following values are recognized and equivalent: fre, francais, Fran├žais, french, fra, french language.

If the XPath or JSONPath is invalid, does not correspond to a node or corresponds to an empty node, corresponding document is invalid (set to KO).

Example

<?xml version="1.0"?>
<afs:xmlFeed xmlns="http://ref.antidot.net/v7/afs#">
<afs:lang XPath="/xpath/to/node/containing/appropriate/value"/>
....
</afs:xmlFeed>

The auto attribute takes the value true. In this case, the language of the document is automatically detected thanks to AFS language detection engine.

It is necessary to add another <lang/> node to one or several <item> node(s), in order to define XPath or JSONPath to text extract(s) used to detect document language. If all XPath or JSONPath are invalid or empty, the document is invalid (set to KO).

It is advised to give enough text to the language detection engine, and to avoid reference fields and titles containing brands.

A language is always associated to the document. A language dictionaries repository is necessary, otherwise all documents are set to KO. It must contain one file per language.

AFS bundled repository is located in /usr/local/afs7/share/samples. It can be overridden by files located in $AFS7/conf/samples. Pay attention to the fact that file name is essential. It must comply to iso 639-1 standard. Example, fran├žais is fr, english is en, dutch is nl, and so on.

Example

<?xml version="1.0"?>
<afs:xmlFeed xmlns="http://ref.antidot.net/v7/afs#">
<afs:lang auto="true"/>
<afs:items>
<afs:item XPath="/doc/title">
<afs:store uri="true" title="true"/>
</afs:item>
<afs:item XPath="/doc/content">
<afs:index weight="80"/>
<afs:store abstract="true"/>
<afs:lang/>
</afs:item>
<afs:item XPath="/doc/reference">
<afs:store clientData="true"/>
</afs:item>
</afs:items>
</afs:xmlFeed>