Posts

Showing posts from August, 2018

An intuitive way to visualize how SVD works

Image
SVD is the core algorithm of Latent Semantic Analysis. I have demonstrated how to perform LSA/LSI using SVD in blog here . However, SVD itself is not an easy thing to understand. To fully understand how it works requires quite a lot of linear algebra knowledge. We all agree math is important but it is not necessary for everyone to understand the math behind SVD in order to use SVD. Is there an intuitive way to see how it works? I mean, without any equations and formulas? Yes, there is. Today, i am going to let you SEE how it works ( : I randomly downloaded a free picture from internet. An adorable parrot bird. In order to convert the image into a matrix that can be applied with SVD, i need to convert it into gray scale first, and then perform SVD. The original gray-scaled image's dimension is 2000 x 3000 (see below) This is how it looks like after we only keep the most important 300 components (2000 x 300). It doesn't look it changes that much, right? T...

Perform efficient Latent Semantic Index using Python

Image
First of all, Latent Semantic Index(LSI) and Latent Semantic Analysis(LSA) are interchangeable terms. In E-discovery world, we often use LSI but in other fields, researchers often use LSA.  Latent semantic analysis   is widely used in information retrieval to search for similar documents. In E-discovery doc review, finding similar documents can drastically decrease the number of documents case team have to review so it is a powerful and essential tool in today's E-discovery industry. LSI is the thing behind Relativity conceptual analytics .  What is LSI and the intuition The main idea of LSI can be generalized in one simple sentence: Words with similar meaning will occur in similar documents .  Each document can have several topics and several words together can be used to express a topic. In another word, each document consists of a mixture of topics, and each topic consists of a collection of words. A word can appear several topics but the mixtures are dif...