afs_doc_index - AFS - Reference Guides

AFS Filters Description

Product
AFS
Platform
7.11
Category
Reference Guides
Language
English

Performs the indexation of structured and unstructured documents

The filter is declared with the afs_doc_index type. It is in the antidot-paf package. It is a processor filter.

The Document index filter specifications are described in the following table:

Parameter name

Mandatory

Type

Default

Description

layers

No

map

N/A

List of layers to index. Key is a valid layer type, value is either a content type, or can be left to empty for auto detection

master

No

layer

CONTENTS

Layer holding the main afs:reply on the search engine

types

No

map

N/A

Forces plugin type for a layer. Useful when mime/type detection fails

exclude_pages

No

list

N/A

List of page numbers to exclude from PDF

exclude_pages_from_end

No

list

N/A

List of page numbers to exclude from PDF, numbered from the end of the document

mining_include

No

list

N/A

Mine only the content of these layers. Options mining_include and mining_exclude are mutually exclusive.

mining_exclude

No

list

N/A

Mine all layers except these ones. Options mining_include and mining_exclude are mutually exclusive.

input_client_data_layers

No

list

N/A

List of layers which have to be considered as full ClientData

thesaurus

No

list

N/A

List of thesaurus files which will be used. If empty, all thesaurus are used

taruqa_mode

No

boolean

false

Deactivate weight computation

The authorized plugin types (and the authorized values for the parameter type) are:
  • XML for XML indexing,
  • PDF for PDF indexing,
  • HTML for HTML indexing,
  • POWERPOINT for Microsoft presentation indexing (.ppt),
  • EXCEL for Microsoft spreadsheet (.xls),
  • WORD for Microsoft Word (.doc),
  • RTF for Rich Text Format indexing,
  • TEXT for plain text indexing,
  • OO_TEXT for OpenOffice document indexing (.odt),
  • OO_SPREADSHEET for OpenOffice spreadsheet indexing,
  • OO_PRESENTATION for OpenOffice presentation indexing,
  • IMAGE for Image indexing based on EXIF, IPTC or XMP,
  • DOCX for Office Word document from 2000 version,
  • XLSX for Office Excel document from 2000 version,
  • PPTX for Office Powerpoint document from 2000 version.

Note: The master layer parameter is optional when the layers parameter defines one key (and the master parameter takes this value). Otherwise, the filter cannot initialize.

Note: Some file formats that are not human-readable, such as svg or step files, cannot be indexed.

Attention: Even though all parameters are optional, the XML indexing needs to define the content type value to join a given feed.xml to a given layer. For more information about the feed.xml file, see Configuring the Indexing Filter For XML-Structured Data.

Tip: Example of how to exclude pages form a PDF file. Suppose we want to exclude the first two pages and the last one (whatever the size of the PDF). excludes_pages parameter must be set to {1,2} and excludes_from_end parameter to {1}. With a PDF of 5 pages, the filter will consider only pages {3,4}.