Search an Unstructured Document Title and Content - Fluid Topics - 3.8

Integrate Fluid Topics API

Product
Fluid Topics
FT_Version
3.8
Category
Technical Notes

It is possible to retrieve the title and the content of a specific unstructured document using the following web services:
Note: User rights are taken into account when the appropriate Authorization header is provided, otherwise only public results are returned.

Before Fluid Topics v3.4.12, Fluid Topics systematically used a prefixed version of the header: FT-Authorization.

From Fluid Topics v3.4.12, both prefixed and not prefixed headers work.

Note: Clustering is not available for this web service.
  • /documents lists all the unstructured documents stored on the Fluid Topics server.
  • /$DOC_ID allows to get metadata information for a specific unstructured document. The document ID is given with /documents.
  • /content allows to retrieve in TEXT/HTML format the content of the given unstructured document.

This can also be done by following the endpoints available for each web service output.

The following lines show an example of the List Unstructured Documents web service implementation in Python.
#!/usr/bin/env python3
import requests
FT_SERVER_URL = 'https:// <host>/<tenantId>/'
                            
#
DOCUMENTS_ENDPOINT = '/api/khub/documents'
                            
# 
HEADERS = {'Authorization': 'Basic ...'}
                            
# 
def crawl_documents():
URL = FT_SERVER_URL + DOCUMENTS_ENDPOINT
…
                            
# 
def crawl_document(document_preview):
URL = FT_SERVER_URL + document_preview['documentApiEndpoint']
…
                            
# 
def crawl_document_content(document_content_preview):
URL = FT_SERVER_URL + document_content_preview['contentApiEndpoint']

After listing the unstructured documents, endpoints can be used to go deeper into the unstructured documents and retrieve information.

As an example, the Get Unstructured Document Metadata web service /$DOC_ID returns the following information for the image "Standard Time Zones of the World".

{
    "id": "n4cQEkKM8SM1f3zRPe_Bfg",
    "filename": "standard_time_zones_of_the_world.png",
    "title": "standard_time_zones_of_the_world.png",
    "mimeType": "image/png",
    "lastEdition": "2020-02-12",
    "lastPublication": "2020-01-30T13:32:14.569295",
    "baseId": "standard-time-zones-of-the-world",
    "originId": "documents/standard_time_zones_of_the_world.png",
    "clusterId": "standard-time-zones-of-the-world",
    "description": "standard time zones of the world",
    "khubVersion": "3.7.6",
    "openMode": "FLUIDTOPICS",
    "prettyUrl": "/go/standard-time-zones-of-the-world",
    "rightsApiEndpoint": "/api/khub/documents/n4cQEkKM8SM1f3zRPe_Bfg/rights",
    "contentApiEndpoint": "/api/khub/documents/n4cQEkKM8SM1f3zRPe_Bfg/content",
    "viewerUrl": "/viewer/document/n4cQEkKM8SM1f3zRPe_Bfg",
    "metadata": [
        {
            "key": "ud:id",
            "label": "ud:id",
            "values": [
                "standard_time_zones_of_the_world.png"
            ]
        }
    ]
}
Tip: Use a REST Client tool like Postman to retrieve the Basic authentication key.
The Basic authentication key can be retrieved using Postman

Going deeper into the topic, the Get a Topic Content web service returns the topic content, here the JPEG image:


Standard Time Zones of the World image