harshkn

harshkn.com

Hidden Markov Model

Apr 28 ^th , 2024

The Hidden Markov Model (HMM) is a statistical model that assumes the system being modeled is a Markov process with unobservable (hidden) states. In this context, HMM is used to analyze sequential or time-series data where the goal is to learn about the hidden parts of the model from observable data. The model consists of hidden states that are not directly visible, and visible states that are observable indirectly through the hidden states.

Demo

When the T9 keys are pressed each time, the sequence is passed to viterbi algorithm. Viterbi algorithm calculates the probablities using the matricies and computes the predicted word. Below demo which uses all the Shakespeare's work to train HMM does an OK job at it as there are many words not predicted correctly.

Press the keys below to enter text

Details

In this setup we use a text file containing all works of Shakespeare to train a HMM model and use the model to predict typed text in a T9 keypad. The input T9 keys pressed by the user are the observable states and the word the user intends to type is hidden state. In this post we try to infer the hidden states by using observed stated.

Shakespeare's all works converted to text file location: link

The HMM used is as follows

p (v, h) = p (v_{1}, \dots, v_{T}, h_{1}, \dots, h_{T}) = p (h_{1}) p (v_{1} | h_{1}) \prod t_{2}^{T} [p (v_{t} | h_{t}) p (h_{t} | h_{t_{1}})]

Sample of Original Text

Sample of Cleaned Text

Text after converting to lowercase, removing special characters, removing multiple spaces etc.

Transition Matrix (P)

The Transition Matrix P is an H×H matrix where each element P ij represents the probability of transitioning from hidden state j to hidden state i. This matrix is crucial for understanding how the hidden states evolve over time, regardless of the observations. The rows of matrix P sum to 1, indicating that the probabilities of moving from any given state to all possible next states together equal 100%.

Initial State Distribution (R)

The Initial State Distribution R is an H×1 vector where each element R_j represents the probability of starting in hidden state j. This vector is essential for initializing the state of the system at the beginning of the sequence before any transitions or observations occur.

Observation Matrix (Q)

The Observation Matrix Q is a V×H matrix. Each element Q ij indicates the probability of observing the visible state i from hidden state j. This matrix links the hidden states to the observable outputs, allowing the model to generate or predict observable data given the hidden state.

Viterbi Algorithm

The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events. This algorithm is particularly useful in the context of Hidden Markov Models where the state space H is 26, representing the 26 possible key presses in a T9 keyboard, and the observable space V is 8, representing the 8 keys on a T9 keypad.

Working of the Viterbi Algorithm

The Viterbi algorithm proceeds by constructing a path probability matrix that stores the probability of the most likely path so far for each state. Here's how it works step-by-step:

Initialization: Set up the initial probabilities in the matrix using the Initial State Distribution (R).
Recursion: For each subsequent time step and for each state, calculate the path probabilities by considering all possible previous state transitions and multiplying the transition probabilities by the probabilities stored in the previous column of the matrix. Update the matrix with the maximum probability.
Termination: After all observations have been processed, the path ending in each state with the highest probability is identified and the overall maximum is selected as the final sequence of states.