Topology - AFS

Antidot Product Architecture

Product
AFS
AFS_Version
7.9
Category
Reference Guide

To index sources and respond to queries, all the AFS modules can run in single or multiple mode, on one or more computers in order to ensure the best performance and availability of the search solution that fit the functional needs and constraints.

The modular approach makes it possible to associate a specific self-sustaining software share to a solution function. This affords a wide implementation flexibility. Every configuration scheme can be imagined, from a single server hosting all the modules and performing all the functions, to setting up tens or hundreds of servers to index hundreds million documents and to respond to tens million monthly queries.

For example, to cope with large volumes, an index can be cut into layers. Each layer can be processed by a distinct reply agent. A middle agent takes care of smoothing the splitting for the rest of the solution and virtualizing the reply agents as if a unique agent for the whole source index.

When designing the physical architecture, the following issues must be considered:

  • The volume of the data to process includes the numbers of sources and documents, the harvesting mode (global or distinctive), and the indexing frequency (weekly, daily, continuous, and so on). The number of crawlers and indexers depends on the volume.
  • The traffic parameter is the number of queries to process. Hardware resources can be added to increase the number of reply agents.
  • Performance also relies on hardware. To cope with high volume or traffic levels, additional hardware resources help increase performance.
  • The availability of the search service and of the associated resources is managed by AFS. Even with limited document volume and traffic, the AFS modules can work on several servers. Modules used in the reply architecture are particularly concerned in order to guarantee availability and quality of service even with very heavy load variations or servers losses. Warm additions or removals of servers hosting modules are possible. When added, modules automatically subscribe to the solution in order to get their work part. The AFS internal communication protocol is designed to watch every module so that modules overloaded, in error, or stopped are cleared from operations.
  • The profiles of the servers used have a major impact of software performance and capacities, in terms of documents to index and reply times.

In order to make the best of the hardware assets, servers can be shared to host different software products. Also their use can be scheduled. For example, data indexing should be performed at night on a server which would operate as the reply agents server in the day time.

Usual architectures may be one of the five following definitions:

No.

Type

Components

Benefits

1

One server

  • All the modules are hosted on the same server.
  • This fits well corporate environments (for example, intranets).
  • It is easy to implement and administer.
  • It saves costs.

2

Two servers

  • One server is used for indexing and the Back Office.
  • One server is used to respond to queries.
  • This makes is possible to separate the indexing and Back Office part from the front-end part in charge of responding to queries.
  • The security of the platform is increased.
  • Traffic and volume capabilities are increased.
  • Implementations of DMZ type are easier.

3

Three servers

  • One server is used for indexing and the Back Office.
  • Two servers are used to respond to queries.
  • Availability is improved with reply agents redundancy.
  • Security is increased with the back-end and front-end separation.

4

Three layers

  • One or more servers are used for the front-end.
  • One or more servers are used for indexing.
  • One or more servers are used for the Back Office (including logs).
  • Security is strengthened because every functional end is separated. From the front-end, access to data or logs and to the Back Office is unavailable.
  • Upgradability is improved with already disconnected layers.

5

Frame

  • Front-ends and agents are dispatched on different servers.
  • Because modules are distributed on suitable resources, an efficient reply frame can be set up. This brings performance, high availability, and upgrabability, and makes it possible to index hundreds million documents and to respond to millions of daily queries.