afs_acc_build - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

The afs_acc_build filter produces a human readable report (in XML format) from Automatic Cross Content data generated by afs_doc_acc filter. The afs_acc_build filter can also produce an ACC reply database. For more information on how to use this reply database, see ACC part of the AFS Integration Guide. The content of both the report and the reply database is associations between a document and its closest neighbors (meaning documents semantically similar). This content can be filtered using several user rules.

The filter is declared with the afs_acc_build type. It is in the antidot-paf-misc package. It is a generator filter.

This filter will only work if instanced after the afs_doc_acc filter.

This filter can be instantiated only once at any given moment. It will not read the "instances" parameter in the configuration.

The ACC Build filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

target_base_dir

No

directory

N/A

If set, export the computed ACC database to the given PaaS PaF. Required to use afs_live_acc.

n

No

integer

10

The maximum number of closest documents.

export_items

No

integer

2n

It is the number of neighbors sent to the Back Office debugging interface. It defaults to 'two times' the value of the parameter n.

generate_reply

No

boolean

False

If set to true, then produces an ACC reply database. See ACC part of the AFS Integration Guide for more information.

generate_report

No

string

N/A

If specified, dump the ACC report in the given file (xml format).

min_proximity

No

float

0

Documents with a proximity below this value will be rejected.

min_words

No

integer

0

Documents with words count below this value will be rejected.

nsmap

No

map

Empty map

Namespaces used to interpret the given xpath

user_rule

No

string

N/A

LUA script used for filtering ACC data.

var_layer

No

layer

CONTENTS

It is the layer where variables declared by the variable parameter are located.

variables

No

map

N/A

User generated variables to be used with user_rule parameter (for each, its name and its xpath).

Note: This filter will only work if instanced after an afs_doc_acc filter.
Attention: Only one afs_acc_build filter can be instanced in a PaF.
Regarding incremental indexing:
  • The XML report contains data regarding documents processed by the current PaF.
  • Documents are compared to every document of current PaF and previous PaF executions, but not the other way around. This means old documents are not updated by comparing them to new documents of the current PaF.

Note: The user_rule parameter is a LUA script. It must return a boolean value (true or false). It is used to filter the list of closest neighbors of a document. It will be applied on each possible combination between a document and all its neighbors. If the script returns false, then the neighbor is discarded. For more information about The Programming Language LUA, see http://www.lua.org/ and LUA Tutorials (http://lua-users.org/wiki/TutorialDirectory).

The following variables are available for the script:
  • doc1 is the reference document.
  • doc2 is the neighbor of doc1 that will be kept of discarded (depending on the result of the script).
  • afs.proximity is the proximity value (between 0 and 1) between doc1 and doc2. If both this variable and min.proximity parameter are set, the most restrictive value will be taken into account.
For each user, a v variable is declared. Its value is available as a string in doc1.v and doc2.v variables. In addition to user declared variables, two variables are available natively:
  • doc1.word and doc2.word: the number of indexed word in the document,
  • doc1.date and doc2.date: the date when the document was processed by the filter, in ISO8061 format.
Tip: It is possible to overload the date variable thanks to the variable parameter (can be for example set to the publication date of a document).

Usage example:
<afs:filter uri="#acc_build" type="afs_acc_build" 
comment="Generate ACC">
            <afs:args>
                <afs:arg name="n" value="5"/>
                <afs:arg name="min_proximity" value="0.60"/>
                <afs:arg name="generate_report" value="$AFS7/acc.xml"/>
                <afs:arg name="variables">
                    <afs:map>
                        <afs:param key="category" 
value="/product/category/@id"/>
                        <afs:param key="price" value="/product/price"/>
                    </afs:map>
                </afs:arg>
                <afs:arg name="var_layer" value="USER_3"/>
                <afs:arg name="user_rule" 
value="return doc1.category == doc2.category 
and (tonumber(doc1.price) / 
tonumber(doc2.price)) < 2"/>
            </afs:args>
</afs:filter>
In this example, for a docId (or given document), the associated docId (or matched documents) will fill the following conditions:
  • be in the same category,
  • have a proximity greater than 60%,
  • have a price maximum two times bigger.

Output report example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<afs:AccReport xmlns:afs="http://ref.antidot.net/v7/afs#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ref.antidot.net/v7/afs# http://ref.antidot.net/v7.5/acc.xsd">
    <afs:PaFUri serviceId="0" status="stable" name="Test"/>
    <afs:acc docId="1" uri="243725">
        <afs:doc docId="22" proximity="0.733" uri="2437.8"/>
        <afs:doc docId="31" proximity="0.726" uri="243823"/>
        <afs:doc docId="25" proximity="0.614" uri="243802"/>
    </afs:acc>
    <afs:acc docId="2" uri="243730"/>
    <afs:acc docId="3" uri="243752">
        <afs:doc docId="31" proximity="0.530" uri="243823"/>
        <afs:doc docId="22" proximity="0.530" uri="2437.8"/>
        <afs:doc docId="1" proximity="0.516" uri="243725"/>
    </afs:acc>
</afs:AccReport>