Discrete Hidden Markov Models for Sequence Classification

Introduction

In this article we will look at hidden Markov models and its application in classification of discrete sequential data.

Markov processes are examples of stochastic processes that generate random sequences of outcomes or states according to certain probabilities.
Markov chains can be considered mathematical descriptions of Markov models with a discrete set of states.
In Markov chain the observed variables are states ,now we will consider the case that the observed symbol is not directly the states but some random variable related to the underlying state sequence.
The true state of the system is latent and unobserved.
Hidden Latent Variables
Let $\mathcal{Y}$ denote a sequence of random variables taking the values in a euclidean space $\mathcal{0}$
A simplistic model is where each observation state is associated with a single hidden state,Each hidden state emits a unique observation symbol. \\ In this case if we know exactly if a observed variable corresponds to a hidden state then the problem reduces to Markov chain. \\
However it may happen that observed state corresponds to more than one hidden state with certain probabilities. For example \begin{eqnarray*} & P(X=x_0|Y=y_1) = 0.9 \\ & P(X=x_0|Y=y_2) = 0.1 \\ \end{eqnarray*}
Thus if the hidden state $x_0$ ,it is likely to emit a observed state $Y=y_1$ with probability 0.9 and observed state $y_1$ with probability 0.1
Thus we can consider than 90\% of time if we observe the state $y_1$ and 10\% of times we will observe the $y_2$
The random variables ${Y}$ are not independent nor do they represent samples from a Markov process.
However given a realization ${X_i=x_i}$ the random variable $Y_i$ is independent of other random variables ${X}$ and therefore other ${Y}$
Given that we know the hidden/latent state,the observation probabilities are independent. \begin{eqnarray*} P(Y_1 Y_2 | X=x_i) = P(Y_1 | X=x_i) P(Y_2 | X=x_i) \end{eqnarray*}
The sequence of observation/emission and corresponding probabilities that emission corresponds to hidden state is specified by a emission matrix.
If N is the number of state and M is number of observations the emission matrix is $NxM$ matrix.
The model is called discrete hidden Markov model since the emission probabilities are discrete in nature,another class of hidden Markov models exist which are called continuous hidden Markov model wherein the emission probabilities are continuous and modeled using parametric distribution.
Forward algorithm
The probability of observed sequence given we are in a hidden state $z_n$ can be computed using forward algorithm
In forward algorithm we compute the probability of observed sequence from first element of sequence to last element of sequence as opposed to backward algorithm where we compute the same starting from last sequence moving backward for 1st element of sequence.
The forward and backward algorithm can be used to compute the probability of observed sequence
The idea is to estimate the underlying hidden states which are Markov chains.
Since we have N hidden states we can compute NXL matrix containing the probabilities of observing the hidden state ,where L is length of sequence.
The result is stored in the matrix $_\alpha$
Each column of matrix $_\alpha$ ie $_\alpha(j,i)$ provides the probabilities of observing the sequence at each increment of the sequence.
The Forward algorithm computes a matrix of state probabilities which can be used to assess the probability of being in each of the states at any given time in the sequences. as $\sum_j \alpha(j,len-1)$ \begin{eqnarray*} Let \\ & \alpha(z_n) =p(x_1,\ldots,x_n,z_n) \\ & \alpha(z_1) =p(x_1,z_n)=p(x_1|z_1)p(z_1) \\ & \alpha(z_{n-1}) =p(x_1,\ldots,x_{n-1},z_{n-1}) \\ & \alpha(z_n) =\sum_{z_{n-1}}p(x_1,\ldots,x_{n-1},x_n,z_{n-1},z_n) \\ & \alpha(z_n) =\sum_{z_{n-1}}p(x_1,\ldots,x_{n-1},z_{n-1})P(x_n,z_n|x_1,\ldots,x_{n-1},z_{n-1}) \\ & \alpha(z_n) =\sum_{z_{n-1}}p(x_1,\ldots,x_{n-1},z_{n-1})P(x_n|z_n,x_1,\ldots,x_{n-1},z_{n-1})\\ & P(z_n|z_n,x_1,\ldots,x_{n-1},z_{n-1}) \\ & \text{Since $x_n$ is independent of all other states given $z_n$}\\ & \text{Since $z_n$ is independent of all other $z_i$ given $z_{n-1}$} \\ &=\sum_{z_{n-1}}p(x_1,\ldots,x_{n-1},z_{n-1})P(x_n|z_n)P(z_n|z_{n-1}) \\ & \alpha(z_n) =\sum_{z_{n-1}}\alpha(z_{n-1}) P(x_n|z_n) P(z_n | z_{n-1})\\ \end{eqnarray*}
Observation Sequence Likelyhood
This if we are given a random sequence of observations $x_1,\ldots,x_n$
Starting from $\alpha(z_1)$ computer $\alpha(z_n)$
After which $P(X={x_1,\ldots,x_n})=\sum_{z_n} \alpha(z_n)$ can be computed.
This gives us a estimate of probability of observing the sequence $x_1,\ldots,x_n$ using the forward algorithm.
The probability estimate is often computed as log probability \begin{eqnarray*} & P(x_1,\ldots,x_n) = \sum_{z_n} \alpha(z_n) \\ & prob= \prod_{i=1}^n \sum_{z_n} \alpha(z_n) \\ & logprob = \sum_n log(\sum{z_n} \alpha(z_n)) \\ \end{eqnarray*}
Sequence Classification
Again for sequence classification assume we have two hidden markov models $\theta_1 = (\pi_!,trans_1,emission_2)$ and $\theta_1 = (\pi_2,trans_2,emission_2)$
Given a observation sequence ${X}={x_1,\ldots,x_n}$ we compute the probability of observing the sequence or probability that models have generate the observation sequence.
The observation sequence is estimated to be produced by the model which exhibits the highest probability to have generated the sequence. \begin{eqnarray*} y = argmax_{n=1}^N prob({X} | \theta_n) \end{eqnarray*}
Code
code for discrete hidden Markov models can be found at git repository https://github.com/pi19404/OpenVision in files \ImgML/hmm.hpp and ImgML/hmm.cpp