Crawling Options for a Specific Website - AIF

AIF Crawl

Product
AIF
Category
Technical Notes
language
English
audience
public

Example with the $AFS7/conf/perimeter/http/www.antidot.net/conf.xml file:

<?xml version="1.0"?>
<afs:conf xmlns:afs="http://ref.antidot.net/v7/afs#">
<!--www.antidot.net-->
<afs:PaF>
<afs:Perimeter>
<afs:defaultRule value="true"/>
<afs:maxDepth value="4"/>
<afs:Rules>
<afs:mapItem key="/en/*" value="false"/>
<afs:mapItem key="*Mentions-legales" value="false"/>
<afs:mapItem key="*/Contact" value="false"/>
<afs:mapItem key="*/user/*" value="false"/>
</afs:Rules>
<afs:Categories>
<afs:mapItem key="*" value="FACETA=foo;FACETB=bar"/>
</afs:Categories>
</afs:Perimeter>
</afs:PaF>
</afs:conf>

Description:

  • By default download all the url of the site,
  • Until 4 max depth,
  • With constraints on some paths:
    • no english pages (/en/*)
    • no legal mentions
    • neither pages of the "Contact/" part of the site, nor pages concerning "user"
  • Categorization of all the pages (*) in facet FACETA for the "foo" value and in FACETB for the "bar" value.