Showing posts with label Latent Semantic Indexing. Show all posts
Showing posts with label Latent Semantic Indexing. Show all posts

Sunday, August 21, 2011

Latent Semantic Indexing

Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

LSI has been used in several ways. The most obvious and common way is to analyze the similarity between bodies of text. This can be used in dozens of interesting ways, from finding related documents in a group to doing paragraph-wise LSI to find site summaries. It can also be used to facilitate a "smart" search of your document space, and even do document categorization (read: SPAM filtering!)

Latent Semantic Indexing (LSI) is a new concept that Google has began to employ and pioneer. It was originally used in Google’s Adsense program, as a way of seeing which adverts would be the most relevant on a particular site. Google recently bought a company called Applied Semantics, in an effort to use LSI concepts and ideas in its search rankings, and many other search engines are beginning to follow suite.

Top Stories