Markov Chains

In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, “What happens next depends only on the state of affairs now” (from Wikipedia).
ChatGPT says: A Markov chain is a mathematical system that models a sequence of events where the probability of each event depends only on the state of the previous event. It’s a “memoryless” system, meaning the past history of events doesn’t influence future probabilities, only the current state does. This makes it useful for modeling systems that evolve over time in a probabilistic way.

Example: The Jumping Grashopper

We will develop some theoretical insight using the following example:

A grasshopper is jumping through the network shown in the plot below.
The grasshopper traverses exactly one edge in one time step.
The transition probabilities between the nodes are indicated by numbers next to the edges.
Edges can only be traversed in the direction given by the arrows.
The grasshopper starts in node A at \(t=0\).

The jumping grasshopper

Because the transition probabilities only depend on the current state and not on the previous states, the sequence of nodes traversed by the grashopper is a typical Markov chain. Therefore, the transition probabilities \(a_{ij}\) can be written in the following way:

\[ a_{ij} = P \left( X_{t+1} = j \mid X_{t} = i\right) \] where \(X_{t}\) denotes the state (or node) at time point \(t\), \(X_{t+1}\) denotes the state at time point \(t+1\), and \(P \left( X_{t+1} = j \mid X_{t} = i\right)\) is the conditional probability that the grashopper is in state \(j\) at time \(t+1\) given that it was in state \(i\) at time point \(t\).

Transition matrix

A so called transition matrix corresponding to the graph can be created:

rowA = c(0.2,0.4,0,0.4,0,0)
rowB = c(0.3, 0,0.5,0,0.2,0)
rowC = c(0,0.4,0,0,0,0.6)
rowD = c(0,0,0,0.4,0.6,0)
rowE = c(0,0,0.3,0.4,0,0.3)
rowF = c(0,0,0.7,0,0,0.3)

T = matrix(c(rowA,rowB,rowC, rowD,rowE,rowF), byrow = TRUE, nrow = 6)
rownames(T) = c("A","B","C","D","E","F")
colnames(T) = rownames(T)
T

##     A   B   C   D   E   F
## A 0.2 0.4 0.0 0.4 0.0 0.0
## B 0.3 0.0 0.5 0.0 0.2 0.0
## C 0.0 0.4 0.0 0.0 0.0 0.6
## D 0.0 0.0 0.0 0.4 0.6 0.0
## E 0.0 0.0 0.3 0.4 0.0 0.3
## F 0.0 0.0 0.7 0.0 0.0 0.3

The matrix elements stand for the transition probabilities between the nodes, e.g. \(a_{EC} = P(E \rightarrow C)=0.3\). This means that the transition probability from state \(E\) to state \(C\) is \(0.3\) or 30 percent.
Starting at any node, the grashopper must jump to any other position, so the sum of probabilities to go somewhere must be one:

rowSums(T)

## A B C D E F 
## 1 1 1 1 1 1

Initial state

The initial state vector displays the probabilities for every node to be occupied by the grasshopper at time \(t=0\). Here, we assume that the grasshopper is in node A with 100% probability:

b = matrix(c(1,0,0,0,0,0), nrow = 1)
colnames(b) = colnames(T)
rownames(b) = "prob"
b

##      A B C D E F
## prob 1 0 0 0 0 0

The first time step

New probabilities after a time step are calculated by multiplying the transition matrix with the actual state vector. The transition matrix must be multiplied from the right side (remember: order matters when matrices are multiplied!).
The start vector \(\boldsymbol{b}\) is a row vector here.

c1 = b %*% T    
c1

##        A   B C   D E F
## prob 0.2 0.4 0 0.4 0 0

This result was expected and can directly be inferred from the graph. The grasshopper starts in node A and goes to A, B, and D with probabilities \(0.2\), \(0.4\), and \(0.4\), respectively.

The second time step

Now, the actual state vector is \(\boldsymbol{c1}\). In order to carry out the next jump, it is again multiplied with the transition matrix:

c2 = c1 %*% T  
c2

##         A    B   C    D    E F
## prob 0.16 0.08 0.2 0.24 0.32 0

This is a little harder to see by looking at the graph. Another proof that calculating is just easier :-).

Many time steps

In general, the probability distribution for the N-th time step can be calculated by multiplying the transition matrix N times, i.e. \(\boldsymbol{c_N} = \boldsymbol{b} \cdot \boldsymbol{T}^N\).

In a loop, we can have a look at every single time step by recursively calculating the probability distribution at time \(i+1\) from time \(i\):

N = 20                                          # number of time steps 
time_course = matrix(0, nrow = N+1, ncol = 6)   # matrix to store the results
colnames(time_course) = c("A","B","C","D","E","F")
rownames(time_course) = paste0("t", 0:N)
time_course[1,] = b                             # at t=0
for (i in 1:N)  {
  time_course[i+1,] = time_course[i,] %*% T     
}
time_course

##              A         B         C          D          E         F
## t0  1.00000000 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000
## t1  0.20000000 0.4000000 0.0000000 0.40000000 0.00000000 0.0000000
## t2  0.16000000 0.0800000 0.2000000 0.24000000 0.32000000 0.0000000
## t3  0.05600000 0.1440000 0.1360000 0.28800000 0.16000000 0.2160000
## t4  0.05440000 0.0768000 0.2712000 0.20160000 0.20160000 0.1944000
## t5  0.03392000 0.1302400 0.2349600 0.18304000 0.13632000 0.2815200
## t6  0.04585600 0.1075520 0.3030800 0.14131200 0.13587200 0.2663280
## t7  0.04143680 0.1395744 0.2809672 0.12921600 0.10629760 0.3025080
## t8  0.05015968 0.1289616 0.3134321 0.11078016 0.10544448 0.2912220
## t9  0.04872042 0.1454367 0.2999695 0.10655373 0.09226042 0.3070592
## t10 0.05337509 0.1394760 0.3153379 0.09901382 0.09301958 0.2997776
## t11 0.05251781 0.1474852 0.3074882 0.09816340 0.08730349 0.3070419
## t12 0.05474912 0.1440024 0.3148630 0.09519388 0.08839508 0.3027965
## t13 0.05415055 0.1478448 0.3104773 0.09533523 0.08591681 0.3062753
## t14 0.05518356 0.1458511 0.3140902 0.09416104 0.08677011 0.3039440
## t15 0.05479205 0.1477095 0.3117174 0.09444588 0.08566685 0.3056683
## t16 0.05527126 0.1466038 0.3135226 0.09396191 0.08620943 0.3044310
## t17 0.05503539 0.1475176 0.3122664 0.09417704 0.08569790 0.3053057
## t18 0.05526234 0.1469207 0.3131821 0.09396413 0.08600973 0.3046609
## t19 0.05512868 0.1473778 0.3125259 0.09409448 0.08576262 0.3051105
## t20 0.05523907 0.1470618 0.3129950 0.09399432 0.08593225 0.3047775

We check here if the sum of the probabilities at each time point is one:

rowSums(time_course)

##  t0  t1  t2  t3  t4  t5  t6  t7  t8  t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## t20 
##   1

We can also calculate the state after 20 leaps at once by using the %^% operator, which carries out exponentiation in R, see the expm-package:

suppressMessages(library(expm))                                   
N = 20 
end_state = b %*% (T %^% N)
end_state

##               A         B        C          D          E         F
## prob 0.05523907 0.1470618 0.312995 0.09399432 0.08593225 0.3047775

Let us compare the final results of both approaches:

round(end_state,10) == round(time_course[N+1,], 10)

##         A    B    C    D    E    F
## prob TRUE TRUE TRUE TRUE TRUE TRUE

(We round a little bit because we cannot require exactly the same result because of numerical inaccuracy.)

Plotting the results

We skip the initial state for better visibility (we know it’s \(P(A)=1\) and zero otherwise). In order to move the legend to the outside of the plot, we are going to manipulate the plot margins a little bit:

default_margins = par("mar")                             # remember the default plot margins
par(mar = c(5.1, 4.1, 4.1, 7), xpd = TRUE)               # increase right margin for legend
ylim = c(min(time_course[-1,]), max(time_course[-1,]))   # skip initial state for better visibility 
colors = c("red", "blue", "green", "darkgoldenrod", "blueviolet", "black")
matplot(1:(nrow(time_course)-1), time_course[-1,], type = "b", lwd = 1.5, lty = 1, 
        pch = c(15,16,17,18,19,20), xlab = "time", ylab = "probability", ylim = ylim, col = colors)  
title("The Jumping Grasshopper", font.main = 1)
mtext("Initial time point not shown", side = 3, col = "blue", cex = 0.9)
legtxt = c("A","B","C","D","E","F")
hor_inset = -0.20                                        # horizontal position of the legend 
legend("topright", inset = c(hor_inset, 0), legend = legtxt, title = "Node", col = colors, lwd = 4)

par(mar = default_margins)                               # reset margins to default

The main message is: It seems the probabilities converge to stationary values. It looks as if a stationary state is reached already after a few time steps. Stationary means that the probabilities (!) to find the grashopper at a certain place remain constant over time. (That does NOT mean that the grashopper resides in the same position. It jumps all the time. It’s only the probabilities staying constant!)
For example, after about 10 time units, one can find the grasshopper in node C with probability \(P \approx 0.312\).

Stationary states

As we have seen above, we have to multiply the current state vector with the transition matrix \(\boldsymbol{T}\) in order to proceed one step in time. Consequently, if a stationary state \(\boldsymbol{\pi}\) exists, it must fullfille the following condition:

\[ \boldsymbol{\pi} \cdot \boldsymbol{T} = \boldsymbol{\pi} \]

This means that another time step - mathematically realised by multiplication with \(\boldsymbol{T}\) - does not change the probabilities represented by the state vector \(\boldsymbol{\pi}\).

The above equation looks pretty much like an eigenvalue equation. If we take the transpose of both sides:

\[ \left( \boldsymbol{\pi} \cdot \boldsymbol{T} \right)^t = \boldsymbol{\pi}^t \]

we get, following the rules of matrix algebra

\[ \boldsymbol{T}^t \cdot \boldsymbol{\pi}^t = \boldsymbol{\pi}^t \]

The general form of an eigenvalue equation is:

\[ \boldsymbol{A} \cdot \boldsymbol{x} = \lambda \cdot \boldsymbol{x} \]

where the \(\boldsymbol{x}\) are the eigenvectors and \(\lambda\) die eigenvalues.

Comparing both equations, we come to the conclusion that \(\boldsymbol{T}^t\), the transpose of the transition matrix, must have an eigenvalue 1 if a stationary state exists. Then, the eigenvector \(\boldsymbol{\pi}^t\) represents the stationary state.

Let us check this with the eigen-function in R:

eigen(t(T))$values

## [1]  1.0000000+0.00000000i -0.7184390+0.00000000i  0.5577800+0.09692218i
## [4]  0.5577800-0.09692218i -0.3070046+0.00000000i -0.1901164+0.00000000i

eigen(t(T))$vectors

##              [,1]          [,2]                  [,3]                  [,4]
## [1,] 0.1146648+0i -0.1507272+0i  0.2616137-0.1813000i  0.2616137+0.1813000i
## [2,] 0.3057729+0i  0.4614458+0i  0.3705739-0.1316979i  0.3705739+0.1316979i
## [3,] 0.6497674+0i -0.6780745+0i  0.2870441+0.0874460i  0.2870441-0.0874460i
## [4,] 0.1953549+0i  0.1423721+0i -0.5889176+0.0000000i -0.5889176+0.0000000i
## [5,] 0.1783675+0i -0.2473591+0i -0.4939123+0.0386021i -0.4939123-0.0386021i
## [6,] 0.6333867+0i  0.4723429+0i  0.1635983+0.1869498i  0.1635983-0.1869498i
##                [,5]           [,6]
## [1,] -0.03873402+0i  0.22455304+0i
## [2,]  0.06546109+0i -0.29200606+0i
## [3,] -0.01150812+0i -0.08576521+0i
## [4,] -0.43551638+0i  0.31638539+0i
## [5,]  0.80851424+0i -0.69131354+0i
## [6,] -0.38821681+0i  0.52814637+0i

Note that the eigenvectors are normalized to unit length by the eigen-function because they are only defined up to a constant, anyway. We can compare our results obtained above with the eigenvectors by normalising our results, too:

end_state/sqrt(sum(end_state^2))    # our result, normalised

##             A         B         C        D         E         F
## prob 0.114745 0.3054832 0.6501667 0.195249 0.1785022 0.6330969

Re(eigen(t(T))$vectors[,1])         # eigenvector corresponding to eigenvalue 1

## [1] 0.1146648 0.3057729 0.6497674 0.1953549 0.1783675 0.6333867

We show only the real part of the eigenvector because the imaginary part is zero.

Necessity of eigenvalue 1

Can we find a transition matrix without an eigenvalue 1 ?

No, we cannot find a transition matrix (for a Markov chain) without an eigenvalue of 1. A transition matrix, which describes the probabilities of moving between states in a system, must have 1 as an eigenvalue.

A key property of stochastic matrices is that the sum of elements in each row (representing probabilities) must be equal to 1. This property directly leads to the existence of an eigenvalue of 1. If you multiply a stochastic matrix by a vector where all elements are 1, the result will also be a vector with all elements equal to 1:

v = matrix(rep(1,6), ncol = 1)
v

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    1
## [5,]    1
## [6,]    1

T %*% v

##   [,1]
## A    1
## B    1
## C    1
## D    1
## E    1
## F    1

This means there’s an eigenvector corresponding to the eigenvalue 1.

\[ \boldsymbol{T} \cdot \boldsymbol{v} = 1 \cdot \boldsymbol{v} \]

The eigenvalue of 1 corresponds to the system’s long-term behavior or steady state. It represents the probability distribution that doesn’t change when the transition matrix is applied repeatedly. Therefore, while other eigenvalues might exist (and their values and multiplicities can vary depending on the specific matrix and chain), the presence of an eigenvalue of 1 is fundamental to the definition and behavior of a transition matrix for a Markov chain.
See Wikipedia.

uwe.menzel@matstat.org

Markov Chains

Uwe Menzel

Markov Chains

Example: The Jumping Grashopper

Transition matrix

Initial state

The first time step

The second time step

Many time steps

Plotting the results

Stationary states

Necessity of eigenvalue 1