Baum-Welch algorithm

1. What is the Baum-Welch algorithm?

the Baum-Welch (BW) algorithm is a special case of the Expectation-Maximization (EM) algorithm
BW is also called forward-backward algorithm
the BW algorithm attempts to find the unknown parameters in Hidden Markov Models (HMM’s)
finds maximum likelihood (ML) estimates of the HMM parameters, given observations of the emitted symbols
finds local maxima of the likelihood (initial values of the parameters are critical)
- parameters: start-, transition- and emission probabilities of the HMM
- hidden data: states of the HMM
Baum-Welch cannot determine the structure of the HMM

2. Example: Dishonest casino

see above: “The Hidden Markov Model: transition and emission probabilities”
the states compass: F (fair die) and L (loaded die)
symbols which can be emitted: 1, 2, 3, 4, 5, 6 (die rolls)
using package HMM

library(HMM)

## Warning: package 'HMM' was built under R version 3.5.2

states = c("F", "L")  
symbols = 1:6

3. Getting some observation

create random sample

part1 = sample(symbols, 300, replace = TRUE)                            # fair
part2 = sample(symbols, 300, replace = TRUE, prob = c(rep(0.1,5), 0.5)) # loaded
part3 = sample(symbols, 300, replace = TRUE)                            # fair
observation <- c(part1, part2, part1)
observation

##   [1] 1 3 2 1 2 1 1 3 6 2 5 2 3 3 4 5 2 4 5 3 3 1 2 5 1 1 3 3 1 4 5 6 4 2 6
##  [36] 5 4 3 3 5 1 2 1 6 1 5 3 5 6 1 3 3 5 1 6 1 5 3 4 5 6 5 5 3 3 3 4 1 1 6
##  [71] 5 6 3 5 6 6 4 1 2 1 6 5 2 5 1 3 5 3 2 3 5 1 4 5 3 4 3 3 3 4 5 5 3 1 1
## [106] 5 2 1 4 5 6 2 6 6 4 4 5 2 1 4 2 3 6 2 2 3 1 2 1 4 1 5 1 5 5 2 3 3 6 6
## [141] 6 4 1 4 4 1 6 4 1 6 2 3 6 2 4 2 6 3 3 1 1 3 1 2 1 6 1 6 3 5 2 5 3 1 1
## [176] 5 5 2 4 2 1 3 1 3 6 1 5 6 3 2 3 6 3 3 2 5 5 2 3 4 4 3 4 4 6 2 2 2 4 5
## [211] 5 4 1 3 1 4 3 3 6 1 2 3 1 6 5 4 1 6 2 6 6 2 6 5 4 3 5 4 6 3 1 6 2 5 1
## [246] 5 3 1 5 6 1 1 3 2 4 3 4 6 1 2 3 3 6 4 5 5 6 3 5 6 3 2 1 3 3 3 2 1 5 2
## [281] 5 5 6 3 6 5 6 2 5 6 1 5 4 2 6 4 3 4 1 2 6 2 2 2 1 6 6 6 5 4 6 6 5 6 6
## [316] 6 6 3 6 6 1 6 6 3 2 6 3 4 6 5 5 6 4 4 6 5 6 2 6 6 1 2 5 6 4 6 6 6 6 6
## [351] 3 6 2 4 6 5 6 4 3 6 4 6 2 2 3 4 6 6 2 6 3 5 5 6 6 6 5 5 6 3 6 6 6 4 4
## [386] 6 6 5 3 6 6 6 6 4 6 6 5 6 3 4 6 6 5 3 6 6 4 6 6 6 5 6 5 4 4 6 6 6 6 6
## [421] 5 6 2 3 5 4 6 4 6 6 1 6 6 6 3 6 6 6 6 6 6 6 6 5 6 3 4 4 4 5 6 6 6 6 6
## [456] 2 3 1 6 6 6 1 6 2 6 6 5 6 6 3 6 6 6 6 5 2 6 1 3 6 6 6 4 6 4 3 6 6 3 5
## [491] 5 6 1 6 3 6 5 3 6 6 6 2 6 6 4 2 6 6 6 5 6 6 6 5 1 2 6 6 6 6 6 2 6 6 6
## [526] 6 6 6 6 6 6 6 1 3 6 4 6 2 6 2 1 2 6 4 6 2 6 1 6 6 3 6 3 4 5 4 6 3 5 6
## [561] 5 6 5 6 3 6 2 6 6 5 2 3 6 6 6 6 3 3 5 6 6 3 6 5 4 2 4 4 6 6 6 6 6 6 5
## [596] 6 2 4 2 6 1 3 2 1 2 1 1 3 6 2 5 2 3 3 4 5 2 4 5 3 3 1 2 5 1 1 3 3 1 4
## [631] 5 6 4 2 6 5 4 3 3 5 1 2 1 6 1 5 3 5 6 1 3 3 5 1 6 1 5 3 4 5 6 5 5 3 3
## [666] 3 4 1 1 6 5 6 3 5 6 6 4 1 2 1 6 5 2 5 1 3 5 3 2 3 5 1 4 5 3 4 3 3 3 4
## [701] 5 5 3 1 1 5 2 1 4 5 6 2 6 6 4 4 5 2 1 4 2 3 6 2 2 3 1 2 1 4 1 5 1 5 5
## [736] 2 3 3 6 6 6 4 1 4 4 1 6 4 1 6 2 3 6 2 4 2 6 3 3 1 1 3 1 2 1 6 1 6 3 5
## [771] 2 5 3 1 1 5 5 2 4 2 1 3 1 3 6 1 5 6 3 2 3 6 3 3 2 5 5 2 3 4 4 3 4 4 6
## [806] 2 2 2 4 5 5 4 1 3 1 4 3 3 6 1 2 3 1 6 5 4 1 6 2 6 6 2 6 5 4 3 5 4 6 3
## [841] 1 6 2 5 1 5 3 1 5 6 1 1 3 2 4 3 4 6 1 2 3 3 6 4 5 5 6 3 5 6 3 2 1 3 3
## [876] 3 2 1 5 2 5 5 6 3 6 5 6 2 5 6 1 5 4 2 6 4 3 4 1 2

4. Initialization of the transition-, emission- , and start probabilities

as in the picture above

trans_prob =  matrix(c(0.95, 0.05, 0.1, 0.9), byrow = TRUE, nrow = 2)            # transition probabilities
rownames(trans_prob) = states
colnames(trans_prob) = states
rowSums(trans_prob)   # must be all one (stochastic matrix)

## F L 
## 1 1

trans_prob

##      F    L
## F 0.95 0.05
## L 0.10 0.90

emission_prob = matrix(c(rep(1/6, 6), rep(0.1, 5), 0.5), byrow = TRUE, nrow = 2)  # emission probabilities
rownames(emission_prob) = states
colnames(emission_prob) = symbols
rowSums(emission_prob)                   # must be all one (stochastic matrix)

## F L 
## 1 1

emission_prob

##           1         2         3         4         5         6
## F 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## L 0.1000000 0.1000000 0.1000000 0.1000000 0.1000000 0.5000000

start_prob = c(0.5, 0.5)                 # if we have no clue how it starts 
names(start_prob) = states
sum(start_prob)

## [1] 1

start_prob

##   F   L 
## 0.5 0.5

5. Running Baum-Welch

hmm = initHMM(states, symbols, startProbs = start_prob, transProbs = trans_prob, emissionProbs = emission_prob)
estimated_hmm <- baumWelch(hmm, observation, maxIterations = 100, delta = 1E-9, pseudoCount = 0) 
round(estimated_hmm$hmm$transProbs, 4)

##     to
## from      F      L
##    F 0.9983 0.0017
##    L 0.0036 0.9964

round(estimated_hmm$hmm$emissionProbs, 4)

##       symbols
## states      1      2      3      4      5      6
##      F 0.1856 0.1507 0.2044 0.1261 0.1749 0.1584
##      L 0.0392 0.0806 0.1025 0.1075 0.1190 0.5512

# estimated_hmm$difference  #  sum of the distances between consecutive transition and emission matrices (L2-norm)
plot(estimated_hmm$difference, ylab = "Distance L2", xlab = "iteration-nr.", col = "red")

uwe.menzel@matstat.org

Baum-Welch algorithm

Uwe Menzel

01/2019

1. What is the Baum-Welch algorithm?

2. Example: Dishonest casino

3. Getting some observation

4. Initialization of the transition-, emission- , and start probabilities

5. Running Baum-Welch