Latent Semantic Indexing (LSI) is an indexing method to establish relationships between the words and concepts in an unstructured corpus. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. The conceptual content of a document can be extracted by establishing associations between those terms that occur in similar contexts.
LSI is useful to solve synonymy (multiple words that have similar meanings) and polysemy (words that have more than one meaning) issues. LSI is also used to perform automated document categorization.