In probability theory and statistics, a Markov chain
or Markov process is a stochastic process describing a
sequence of possible events in which the probability of each event
depends only on the state attained in the previous event. Informally,
this may be thought of as, “What happens next depends only on the state
of affairs now” (from Wikipedia).
ChatGPT says: A Markov chain is a
mathematical system that models a sequence of events where the
probability of each event depends only on the state of the previous
event. It’s a “memoryless” system, meaning the past history of events
doesn’t influence future probabilities, only the current state does.
This makes it useful for modeling systems that evolve over time in a
probabilistic way.
We will develop some theoretical insight using the following example:
The jumping grasshopper
Because the transition probabilities only depend on the current state and not on the previous states, the sequence of nodes traversed by the grashopper is a typical Markov chain. Therefore, the transition probabilities \(a_{ij}\) can be written in the following way:
\[ a_{ij} = P \left( X_{t+1} = j \mid X_{t} = i\right) \] where \(X_{t}\) denotes the state (or node) at time point \(t\), \(X_{t+1}\) denotes the state at time point \(t+1\), and \(P \left( X_{t+1} = j \mid X_{t} = i\right)\) is the conditional probability that the grashopper is in state \(j\) at time \(t+1\) given that it was in state \(i\) at time point \(t\).
A so called transition matrix corresponding to the graph can be created:
rowA = c(0.2,0.4,0,0.4,0,0)
rowB = c(0.3, 0,0.5,0,0.2,0)
rowC = c(0,0.4,0,0,0,0.6)
rowD = c(0,0,0,0.4,0.6,0)
rowE = c(0,0,0.3,0.4,0,0.3)
rowF = c(0,0,0.7,0,0,0.3)
T = matrix(c(rowA,rowB,rowC, rowD,rowE,rowF), byrow = TRUE, nrow = 6)
rownames(T) = c("A","B","C","D","E","F")
colnames(T) = rownames(T)
T
## A B C D E F
## A 0.2 0.4 0.0 0.4 0.0 0.0
## B 0.3 0.0 0.5 0.0 0.2 0.0
## C 0.0 0.4 0.0 0.0 0.0 0.6
## D 0.0 0.0 0.0 0.4 0.6 0.0
## E 0.0 0.0 0.3 0.4 0.0 0.3
## F 0.0 0.0 0.7 0.0 0.0 0.3
The matrix elements stand for the transition probabilities between
the nodes, e.g. \(a_{EC} = P(E \rightarrow
C)=0.3\). This means that the transition probability from state
\(E\) to state \(C\) is \(0.3\) or 30 percent.
Starting at any node, the grashopper must jump to any other position, so
the sum of probabilities to go somewhere must be one:
rowSums(T)
## A B C D E F
## 1 1 1 1 1 1
The initial state vector displays the probabilities for every node to be occupied by the grasshopper at time \(t=0\). Here, we assume that the grasshopper is in node A with 100% probability:
b = matrix(c(1,0,0,0,0,0), nrow = 1)
colnames(b) = colnames(T)
rownames(b) = "prob"
b
## A B C D E F
## prob 1 0 0 0 0 0
New probabilities after a time step are calculated by multiplying the
transition matrix with the actual state vector. The transition matrix
must be multiplied from the right side
(remember: order matters when matrices are multiplied!).
The start vector \(\boldsymbol{b}\) is
a row vector here.
c1 = b %*% T
c1
## A B C D E F
## prob 0.2 0.4 0 0.4 0 0
This result was expected and can directly be inferred from the graph. The grasshopper starts in node A and goes to A, B, and D with probabilities \(0.2\), \(0.4\), and \(0.4\), respectively.
Now, the actual state vector is \(\boldsymbol{c1}\). In order to carry out the next jump, it is again multiplied with the transition matrix:
c2 = c1 %*% T
c2
## A B C D E F
## prob 0.16 0.08 0.2 0.24 0.32 0
This is a little harder to see by looking at the graph. Another proof that calculating is just easier :-).
In general, the probability distribution for the N-th time step can be calculated by multiplying the transition matrix N times, i.e. \(\boldsymbol{c_N} = \boldsymbol{b} \cdot \boldsymbol{T}^N\).
In a loop, we can have a look at every single time step by recursively calculating the probability distribution at time \(i+1\) from time \(i\):
N = 20 # number of time steps
time_course = matrix(0, nrow = N+1, ncol = 6) # matrix to store the results
colnames(time_course) = c("A","B","C","D","E","F")
rownames(time_course) = paste0("t", 0:N)
time_course[1,] = b # at t=0
for (i in 1:N) {
time_course[i+1,] = time_course[i,] %*% T
}
time_course
## A B C D E F
## t0 1.00000000 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000
## t1 0.20000000 0.4000000 0.0000000 0.40000000 0.00000000 0.0000000
## t2 0.16000000 0.0800000 0.2000000 0.24000000 0.32000000 0.0000000
## t3 0.05600000 0.1440000 0.1360000 0.28800000 0.16000000 0.2160000
## t4 0.05440000 0.0768000 0.2712000 0.20160000 0.20160000 0.1944000
## t5 0.03392000 0.1302400 0.2349600 0.18304000 0.13632000 0.2815200
## t6 0.04585600 0.1075520 0.3030800 0.14131200 0.13587200 0.2663280
## t7 0.04143680 0.1395744 0.2809672 0.12921600 0.10629760 0.3025080
## t8 0.05015968 0.1289616 0.3134321 0.11078016 0.10544448 0.2912220
## t9 0.04872042 0.1454367 0.2999695 0.10655373 0.09226042 0.3070592
## t10 0.05337509 0.1394760 0.3153379 0.09901382 0.09301958 0.2997776
## t11 0.05251781 0.1474852 0.3074882 0.09816340 0.08730349 0.3070419
## t12 0.05474912 0.1440024 0.3148630 0.09519388 0.08839508 0.3027965
## t13 0.05415055 0.1478448 0.3104773 0.09533523 0.08591681 0.3062753
## t14 0.05518356 0.1458511 0.3140902 0.09416104 0.08677011 0.3039440
## t15 0.05479205 0.1477095 0.3117174 0.09444588 0.08566685 0.3056683
## t16 0.05527126 0.1466038 0.3135226 0.09396191 0.08620943 0.3044310
## t17 0.05503539 0.1475176 0.3122664 0.09417704 0.08569790 0.3053057
## t18 0.05526234 0.1469207 0.3131821 0.09396413 0.08600973 0.3046609
## t19 0.05512868 0.1473778 0.3125259 0.09409448 0.08576262 0.3051105
## t20 0.05523907 0.1470618 0.3129950 0.09399432 0.08593225 0.3047775
We check here if the sum of the probabilities at each time point is one:
rowSums(time_course)
## t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## t20
## 1
We can also calculate the state after 20 leaps at once by using the %^% operator, which carries out exponentiation in R, see the expm-package:
suppressMessages(library(expm))
N = 20
end_state = b %*% (T %^% N)
end_state
## A B C D E F
## prob 0.05523907 0.1470618 0.312995 0.09399432 0.08593225 0.3047775
Let us compare the final results of both approaches:
round(end_state,10) == round(time_course[N+1,], 10)
## A B C D E F
## prob TRUE TRUE TRUE TRUE TRUE TRUE
(We round a little bit because we cannot require exactly the same result because of numerical inaccuracy.)
We skip the initial state for better visibility (we know it’s \(P(A)=1\) and zero otherwise). In order to move the legend to the outside of the plot, we are going to manipulate the plot margins a little bit:
default_margins = par("mar") # remember the default plot margins
par(mar = c(5.1, 4.1, 4.1, 7), xpd = TRUE) # increase right margin for legend
ylim = c(min(time_course[-1,]), max(time_course[-1,])) # skip initial state for better visibility
colors = c("red", "blue", "green", "darkgoldenrod", "blueviolet", "black")
matplot(1:(nrow(time_course)-1), time_course[-1,], type = "b", lwd = 1.5, lty = 1,
pch = c(15,16,17,18,19,20), xlab = "time", ylab = "probability", ylim = ylim, col = colors)
title("The Jumping Grasshopper", font.main = 1)
mtext("Initial time point not shown", side = 3, col = "blue", cex = 0.9)
legtxt = c("A","B","C","D","E","F")
hor_inset = -0.20 # horizontal position of the legend
legend("topright", inset = c(hor_inset, 0), legend = legtxt, title = "Node", col = colors, lwd = 4)
par(mar = default_margins) # reset margins to default
The main message is: It seems the probabilities converge to
stationary values. It looks as if a stationary
state is reached already after a few time steps. Stationary
means that the probabilities (!) to find the grashopper
at a certain place remain constant over time. (That does
NOT mean that the grashopper resides in the same
position. It jumps all the time. It’s only the probabilities staying
constant!)
For example, after about 10 time units, one can find the grasshopper in
node C with probability \(P
\approx 0.312\).
As we have seen above, we have to multiply the current state vector with the transition matrix \(\boldsymbol{T}\) in order to proceed one step in time. Consequently, if a stationary state \(\boldsymbol{\pi}\) exists, it must fullfille the following condition:
\[ \boldsymbol{\pi} \cdot \boldsymbol{T} = \boldsymbol{\pi} \]
This means that another time step - mathematically realised by multiplication with \(\boldsymbol{T}\) - does not change the probabilities represented by the state vector \(\boldsymbol{\pi}\).
The above equation looks pretty much like an eigenvalue equation. If we take the transpose of both sides:
\[ \left( \boldsymbol{\pi} \cdot \boldsymbol{T} \right)^t = \boldsymbol{\pi}^t \]
we get, following the rules of matrix algebra
\[ \boldsymbol{T}^t \cdot \boldsymbol{\pi}^t = \boldsymbol{\pi}^t \]
The general form of an eigenvalue equation is:
\[ \boldsymbol{A} \cdot \boldsymbol{x} = \lambda \cdot \boldsymbol{x} \]
where the \(\boldsymbol{x}\) are the eigenvectors and \(\lambda\) die eigenvalues.
Comparing both equations, we come to the conclusion that \(\boldsymbol{T}^t\), the transpose of the transition matrix, must have an eigenvalue 1 if a stationary state exists. Then, the eigenvector \(\boldsymbol{\pi}^t\) represents the stationary state.
Let us check this with the eigen-function in R:
eigen(t(T))$values
## [1] 1.0000000+0.00000000i -0.7184390+0.00000000i 0.5577800+0.09692218i
## [4] 0.5577800-0.09692218i -0.3070046+0.00000000i -0.1901164+0.00000000i
eigen(t(T))$vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.1146648+0i -0.1507272+0i 0.2616137-0.1813000i 0.2616137+0.1813000i
## [2,] 0.3057729+0i 0.4614458+0i 0.3705739-0.1316979i 0.3705739+0.1316979i
## [3,] 0.6497674+0i -0.6780745+0i 0.2870441+0.0874460i 0.2870441-0.0874460i
## [4,] 0.1953549+0i 0.1423721+0i -0.5889176+0.0000000i -0.5889176+0.0000000i
## [5,] 0.1783675+0i -0.2473591+0i -0.4939123+0.0386021i -0.4939123-0.0386021i
## [6,] 0.6333867+0i 0.4723429+0i 0.1635983+0.1869498i 0.1635983-0.1869498i
## [,5] [,6]
## [1,] -0.03873402+0i 0.22455304+0i
## [2,] 0.06546109+0i -0.29200606+0i
## [3,] -0.01150812+0i -0.08576521+0i
## [4,] -0.43551638+0i 0.31638539+0i
## [5,] 0.80851424+0i -0.69131354+0i
## [6,] -0.38821681+0i 0.52814637+0i
Note that the eigenvectors are normalized to unit length by the eigen-function because they are only defined up to a constant, anyway. We can compare our results obtained above with the eigenvectors by normalising our results, too:
end_state/sqrt(sum(end_state^2)) # our result, normalised
## A B C D E F
## prob 0.114745 0.3054832 0.6501667 0.195249 0.1785022 0.6330969
Re(eigen(t(T))$vectors[,1]) # eigenvector corresponding to eigenvalue 1
## [1] 0.1146648 0.3057729 0.6497674 0.1953549 0.1783675 0.6333867
We show only the real part of the eigenvector because the imaginary part is zero.
Can we find a transition matrix without an eigenvalue 1 ?
No, we cannot find a transition matrix (for a Markov chain) without an eigenvalue of 1. A transition matrix, which describes the probabilities of moving between states in a system, must have 1 as an eigenvalue.
A key property of stochastic matrices is that the sum of elements in each row (representing probabilities) must be equal to 1. This property directly leads to the existence of an eigenvalue of 1. If you multiply a stochastic matrix by a vector where all elements are 1, the result will also be a vector with all elements equal to 1:
v = matrix(rep(1,6), ncol = 1)
v
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 1
T %*% v
## [,1]
## A 1
## B 1
## C 1
## D 1
## E 1
## F 1
This means there’s an eigenvector corresponding to the eigenvalue 1.
\[ \boldsymbol{T} \cdot \boldsymbol{v} = 1 \cdot \boldsymbol{v} \]
The eigenvalue of 1 corresponds to the system’s long-term behavior or
steady state. It represents the probability distribution that doesn’t
change when the transition matrix is applied repeatedly. Therefore,
while other eigenvalues might exist (and their values and multiplicities
can vary depending on the specific matrix and chain), the presence of an
eigenvalue of 1 is fundamental to the definition and behavior of a
transition matrix for a Markov chain.
See Wikipedia.