Configure Indexing Environment - ABO

Concept Agent Managed From BO

Product
ABO
AFS_Version
7.7
Category
Technical Notes

Create a Launch Script

The script used to launch PaF contains:

#!/bin/bash

export AFS7="/usr/local/afs7/PaF/Concept"
export VOCA_NAME="$AFS7/conf/myVoca.rdf"

/usr/local/afs7/bin/afs_paf -p ${AFS7}/conf/paf.xml

Here is the trick. When PaF starts, it automatically downloads its Vocabularies. Usually, these vocabularies are used for semantic expansions of indexed data.
Here, the vocabulary is the processed file.

$VOCA_NAME variable is used further as filter arguments. It is the vocabulary managed from Back Office.

It is generated in SKOS format, located in $AFS/conf, and named as in BO with .rdf extension.

Add PreScript

The following line can be added as prescript, in order to clean the environment before every new indexing:

rm -rf $AFS7/db $AFS7/rdf $AFS7/repository $AFS7/var $AFS7/tmp

Configure filters

The following filters are necessary, in this order (installing antidot-aif and antidot-paf-rdf packages makes them available):

  • afs_concept_build
  • afs_rdfx_serialize
  • afs_xml_xslt
  • afs_dbm_store
  • afs_concept_deploy

  • afs_concept_build

This filter generates the necessary .dict file.

The following parameters must be set:

  • input_files, the Vocabulary generated using BO: $VOCA_NAME.
  • output_file, the .dict file to generate.

  • afs_rdfx_serialize

This filter converts desired data from the vocabulary into RDFx.

The following parameters must be set:

  • split_mode must be set to true.
  • rdf_files, the Vocabulary generated using BO: $VOCA_NAME.
  • predicates_to_follow are the predicates followed by the filter. They are:

<afs:param value="http://www.w3.org/2004/02/skos/core#broader"/>
<afs:param value="http://www.w3.org/2004/02/skos/core#narrower"/>
<afs:param value="http://www.w3.org/2004/02/skos/core#inScheme"/>
<afs:param value="http://www.w3.org/2004/02/skos/core#closeMatch"/>
<afs:param value="http://www.w3.org/2004/02/skos/core#exactMatch"/>

  • sparql_directories is the folder where is located the SPARQL query (such as $AFS7/sparql). The query selects every extracted Concepts.

SELECT ?x WHERE { ?x a <http://www.w3.org/2004/02/skos/core#Concept>.}

  • output_layer is set to CONTENTS.

  • afs_xml_xslt

This filter converts previously generated RDFx data to XML.

The following parameter must be set:

  • xsl_file is the XSL file used (such as AFS7/xsl/RDFxToXML.xsl in our example).

  • afs_dbm_store

This filter converts previously generated XML to concept agent format repository.afs file.

The following parameters must be set:

  • xpath is set to /item/@uri.
  • output_dir is set to $AFS7/repository.

  • afs_concept_deploy

This filter deploys generated databases to reply servers.

The following parameters must be set:

  • dict_path is set to $AFS7/repository.
  • dbm_store_path is set to $AFS7/repository.