It is necessary to have a reasonable data volume estimate to help size the server architecture. Architectures may well vary with the size of each indexed content item. Indexing web pages of a few kilobytes and a DMS server (Document Management System) with heavy PDFs is quite different.
For recommendation purpose, the following points can be considered:
- For 100,000 web pages, an average 10-GB back-end space is necessary to store the crawled pages and to generate the indexes. And 1 GB is necessary at the front-end level to store the generated indexes and the snippets.
- For 100,000 structured items (of eCommerce catalog entry or database item type), an average 5 GB back-end space is used for indexing, and 500 Mb for the index front-ends.
- For logs, an average 1 GB is necessary per layer of 5-million queries.
- A two-processor server of the latest generation makes it possible to index more than 1 million daily documents (according to the type and complexity of the applied processes).
To allow users to adjust their hardware equipment to the best usage of AFS v7.11, the Antidot recommendations include:
- 1 core of a physical CPU
- 4 GB of memory
- 40 MB/s of disk
- 100 Mbps of network bandwidth
A query is made of a user call on the search engine web service (keyword search, facet click, alert). Autocompletions are not counted as queries and are included in the sizing of the architecture:
- One APU allows indexing between 50 and 200 MB per hour depending on the complexity of the processing.
- One APU allows 4 GB of index and serving an average of 1 QPS (Query Per Second) with the ability to handle 2 QPS on peak mode.