Command Line Interface aka CLI

Posts

Stack vs Heap C++

September 25, 2019

https://www.learncpp.com/cpp-tutorial/79-the-stack-and-the-heap/ Heap allocation in C++ is done by using new keyword. #include < iostream > #include < vector > using namespace std; struct Node { string tag; vector<Node *> children; }; vector<Node *> nodes; void process(){ for ( int i = 0 ; i< 3 ; i++){ Node* n = new Node() ; n->tag = "a" ; nodes.push_back(n); } } int main() { std::cout << "Hello World!\n" ; process(); for ( int i = 0 ; i< 3 ; i++){ cout<<nodes[i]->tag<<endl; } } above works fine but below will not #include < iostream > #include < vector > using namespace std; struct Node { string tag; vector<Node *> children; }; vector<Node *> nodes; void process(){ for ( int i = 0 ; i< 3 ; i++){ Node n ; // n is allowed on stack and mi...

Transition Matrix in Markov Chain Explained

August 27, 2019

Markov chain is a very important math model. Many practical problems can be modeled as Markov chain, and then be solved. I would like to talk about the transition matrix, specifically how to deduce the transition matrix after step n. Let us use a simple but non-trivial example. Let say we have only two states: State 1 and State 2. State 1 has 0.6 probability to stay at itself, and 0.4 probability to go to State 2. State 2 has 0.8 probability to stay at itself and 0.2 probability to go to State 1. We can easily to write down the transition matrix below: $$q = \begin{pmatrix} 0.6 & 0.4 \\ 0.2 & 0.8 \\ \end{pmatrix} $$ Please notice, in different literature, people tend to use different way of representing transition matrix. In our example, each row represents the corresponding probability for each state. Let us assume we start at $T_0$, which has PMF of $(1,0)$. It simply means at $T_0$, it is 100% at State 1. What happen after 1 step at $T_1$, what will the PMF cha...

Memory-less Property of Distribution

August 09, 2019

By definition, memorylessness means the distribution of waiting time (in continuous case) or number of trials (in discrete case) does not depend on how much time has passed or how many trials you have tried. There are only two distributions are memoryless: exponential distribution in continuous case, and geometric distribution in discrete case. Actually geometric is analogous to exponential. Mathematically, memorylessness can be expressed as below where m and n are non-negative integers in discrete case, or non-negative real numbers in continuous case: $$P(X \ge m+n | X \geq m) = P(X \ge n)$$ Using conditional probability, we can write left hand side to $$ P(X \ge m+n | X \geq m) = \frac {P( X \ge m+n, X \geq m)}{P(X \geq m)}$$ Since m and n are both non-negative, we can simplify the numerator as $P( X \ge m+n, X \geq m) = P(X \ge m+n)$, so above equation becomes $$P(X \ge m+n | X \geq m) = \frac {P( X \ge m+n)}{P(X \geq m)} $$ It is not hard to prove why exponential ...

Variance, Covariance, Correlation and Covariance Matrix

July 29, 2019

Variance is a special case of covariance when two random variables are the same. Correlation is standardized covariance which is unit agnostic metrics, so you could compare any two pairs of random variables without worrying about their units. Covariance and Variance Let us look into covariance first. By definition, covariance is defined as below: $Cov(X,Y)=E[(X - E(X))(Y - E(Y))]$ Using linearity of expectation, you can easily get $Cov(X,Y)=E(XY) - E(X)E(Y)$ In statistics, i always would like to use a non-trivial example to demonstrate a concept. Here, i will demonstrate one discrete case and one continuous case. 1) Let's say we have two discrete random variables X and Y. the E(X) = 2, and E(Y) = 3, so now we can easily calculate the Cov(X,Y) as below: $Cov(X,Y)= \sum f(x,y)(x-E(X))(y-E(Y))$ $= 0.15*(1-2)*(2-3) + 0*(1-2)*(3-3)+0.15*(1-2)*(4-3)$ $+0.1*(2-2)*(2-3)+0.15*(2-2)*(3-3)+0*(2-2)*(4-3)$ $+0*(3-2)*(2-3)+0.15*(3-2)*(3-3)+0.3*(3-2)*(4-3) = 0.3$ 2)...

SVM

November 10, 2018

SVM has been a quite hot topic in recent E-discovery scene. The reason is Relativity has introduced Active Learning which is based on SVM. Now everybody knows it works especially well on classification. However what on earth is SVM? This blog is meant to take you through it briefly so you can know something more than just how to set up an active learning project in Relativity (: Introduction First of all, SVM is a supervised learning algorithm in general, meaning it is often used to classify data which is fundamentally different from LSI. You will need labelled data to train SVM model to predict on unseen data. In a nutshell, SVM is trying to find the hyperplane that can maximize the separation margins. So in another word, there might be many hyperplanes that can separate your data, but SVM is trying to find the best one for you! Nearly all the SVM tutorials will explain what is hyperplane and what is the margins, so i wont discuss it much about them here. The basic SVM is linear ...

An intuitive way to visualize how SVD works

August 23, 2018

SVD is the core algorithm of Latent Semantic Analysis. I have demonstrated how to perform LSA/LSI using SVD in blog here . However, SVD itself is not an easy thing to understand. To fully understand how it works requires quite a lot of linear algebra knowledge. We all agree math is important but it is not necessary for everyone to understand the math behind SVD in order to use SVD. Is there an intuitive way to see how it works? I mean, without any equations and formulas? Yes, there is. Today, i am going to let you SEE how it works ( : I randomly downloaded a free picture from internet. An adorable parrot bird. In order to convert the image into a matrix that can be applied with SVD, i need to convert it into gray scale first, and then perform SVD. The original gray-scaled image's dimension is 2000 x 3000 (see below) This is how it looks like after we only keep the most important 300 components (2000 x 300). It doesn't look it changes that much, right? T...

Search This Blog