Perimeter - AFS

AFS Configuration Options

Product
AFS
AFS_Version
7.10
Category
Reference Guide

Prefix: PaF/Perimeter

Crawl perimeter definition

The following table lists and describes the Perimeter configuration options.

Option

Type (default)

Description

Role

Editable in Back Office

defaultRule

bool (false)

Should URIs be allowed by default ?. Will probaly be refined with afs:crawl=allow,... like rules

developer

Yes

Rules

string_map

List of perimeter rules. Key is path, Value is true to ALLOW and false to DISCARD

developer

Yes

Categories

string_map

List of categories rules. Key is path, Value is a list of facet=value pairs separated by semicolons. Several values can be used if separated by comma. Example: site=Antidot;type=Blog,Info/Corporate will set two facets, 'site' with value 'Antidot' and type with values 'Blog' and 'Info/Corporate'

developer

Yes

pre

string_map

Ordered list of prereq. Key is prerequiste category, value is category-dependent. For now 'crawl' category is supported, value is an uri to fetch before any other uri on the site - useful to force authenfication through a login page, for example

developer

Yes

formData

string_map

Form data for uris. Key=path value=formdata as in param1=value1&param2=value2

developer

Yes

Logins

string_map

List of logins. Key=path value=user:password

developer

Yes

maxDepth

uint (0)

Maximum crawl depth. Seed uri has depth 1, new uri discovered by uri at depth n has depth (n+1). Use 0 for unlimited depth

developer

Yes

disableRobotsTxt

bool (false)

If true then do not download and analyze robots.txt. Strongly discouraged when web crawling

developer

Yes

disableSitemaps

bool (false)

If true then do not download and analyze sitemaps.

developer

Yes

ignoreNOINDEX

bool (false)

If true then do ignore Meta NOINDEX in HTML Documents. Use with caution.

developer

Yes

ignoreNOFOLLOW

bool (false)

If true then do ignore Meta NOFOLLOW in HTML Documents. Use with caution.

developer

Yes

maxDurationS

uint (30)

Maximum number of seconds allowed for retrieving data

developer

Yes

minSleepDelayS

uint (1)

Minimum delay, in seconds, between two crawls on the site (measured from the end of one download to the start of the next one). Can be set to 0 to disable sleeping - beware this is against netiquette and should only be done if crawled sitemaster agrees with it

developer

Yes