Preparation
Introduction to MCMC: Metropolis Algorithm
- Politician Moving between Islands (Kruschke 2015: pp.146-156)
- Simulation of the Posterior of a Binomial Model
Exercises
- Q1. Choosing a T-shirt color
  - (1)
- Q2. Fish for dinner (Toyoda 2015: p.64)
  - (1)
  - (2)
  - (3)

Textbook: McElreath (2016), chapter 8.

Preparation

Load some packages.

library('mvtnorm')
library('dplyr')
library('ggplot2')

Introduction to MCMC: Metropolis Algorithm

Politician Moving between Islands (Kruschke 2015: pp.146-156)

Suppose there are seven inhabited islands. Let $\theta = \{1, 2, \dots, 7\}$ denote each island. And suppose that the popualtion of $\theta$ is $q(\theta) = \theta$. Since \[\sum q(\theta) \neq 1\], $q$ is not a probability distribution.

After the politician randomly decides the island to visit in the first period, he will

flip a fair coin and propose that he move to the eash if the oin lands head or that he move to the west if the coind lands tail
Get $q_i$ and $q_j$
Decide if he moves to the proposal or stays in the currren island
- If $q_j > q_i$, move!
- If $q_j \leq q_i$, move with the probability of $q_j / q_i$
Repeat the procedure.

Let’s simulate this in R.

pol_jump <- function(n_sims = 100, init = sample(1:7, size = 1)) {
    pop <- c(0, 1:7, 0) # Assume there exist 0th and 8th islands that are uninhabited
    current <- init     # the initial island
    res_df <- data.frame(time = 1:n_sims,
                         visited = rep(NA, n_sims))
    res_df$visited[1] <- current
    for (t in 2:n_sims){
        proposed <- current + ifelse(rbinom(1, 1, prob = .5), 1, -1)
        prob_move <- min(1, pop[proposed + 1] / pop[current + 1])
        current <- ifelse(runif(1, 0, 1) < prob_move, proposed, current)
        res_df$visited[t] <- current
    }
    return(res_df)
}
set.seed(2016-11-04)
res <- pol_jump(n_sims = 10000)

Let’s make a trace plot.

trace1 <- ggplot(res, aes(x = time, y = visited)) +
    geom_line() + coord_flip()
print(trace1)

trace2 <- ggplot(res, aes(x = log(time), y = visited)) +
    geom_line() + xlim(1, log(500)) + ylim(0, 8) + coord_flip()
print(trace2)

## Warning: Removed 9502 rows containing missing values (geom_path).

Let’s discard the first 5,000 samples as burn-in, and visualize the simulated distribution.

hist <- ggplot(filter(res, time > 5000), aes(x = as.factor(visited))) +
    geom_bar(width = .2) +
    labs(x = 'island', y = 'number of visits')
print(hist)

Simulation of the Posterior of a Binomial Model

We’d like to get the posterior distribution $p(\theta | y)$, where the likelihood is \[y|\theta \sim \mbox{Bin}(n, \theta),\] and the prior is \[\theta \sim \mbox{Beta}(\alpha, \beta).\] Then, the non-normalized posterior $q(\theta|y)$ can be written as \[q(\theta|y) \propto p(y | \theta) p(\theta) = \theta^{y + \alpha - 1} (1 -\theta)^{n - y + \beta -1}.\]

When $(n, y, \alpha, \beta) = (20, 14, 1, 1)$, the non-normalized posterior becomes \[q(\theta|y) = \theta^{14} (1 -\theta)^{6}.\]

Using this, let’s sample (simulate) the posterior distribution.

First, define the non-normalized posteior.

nonnorm <- function(theta) {
    if (theta < 0 | theta > 1) return(0)
    else return(theta^14 * (1 - theta)^6)
}

Now, define the MH algorith. Here we use the random walk $\mbox{Normal}(0, 0.1^2)$ as the proposal distribution.

bin_sim <- function(n_sims = 100, init = runif(1, 0, 1)) {
    current <- init
    res_df <- data.frame(time = 1:n_sims,
                         post_theta = rep(NA, n_sims))
    res_df$post_theta[1] <- current
    for (t in 2:n_sims) {
        prop <- current + rnorm(1, mean = 0, sd = .1)
        # use log because the nonnorm can be very small
        log_r <- log(nonnorm(prop)) -log(nonnorm(current))
        prob_move <- min(1, exp(log_r))
        current <- ifelse(runif(1, 0, 1) < prob_move, prop, current)
        res_df$post_theta[t] <- current
    }
    return(res_df)
}
set.seed(2016-11-05)
sim <- bin_sim(n_sims = 10000)

To exmine the behavior of the chain, let’s make a trace plot.

trace <- ggplot(sim, aes(x = time, y = post_theta)) + geom_line()
plot(trace)

Let’s focus on the first half of the chain.

plot(trace %+% filter(sim, time < 5001))

There is no clear indication of non-convergence. Let’s discard the first half of our smaple as burn-in and visualize the posterior distribution.

sim_post <- sim %>% filter(time > 5000)
hist <- ggplot(sim_post, aes(x = post_theta, y = ..density..)) +
    geom_histogram(color = 'black') +
    labs(x = expression(paste(theta, '|', y)))
print(hist)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Using this sample, the mean and variance are

mean(sim_post$post_theta)

## [1] 0.6750077

var(sim_post$post_theta)

## [1] 0.01023553

Theoretically, the posterior must be $\mbox{Beta}(y + \alpha, n - y + \beta)$, so the mean and variance should be \[\mathrm{E}(\theta | y) = \frac{y + \alpha}{n + \alpha + \beta} = \frac{14 + 1}{20 + 1 + 1} \approx 0.68,\] \[\mathrm{Var}(\theta|y) = \frac{(y + \alpha)(n-y + \beta)}{(n + \alpha + \beta)^2(n+ \alpha + \beta+1)}=\frac{(14 + 1)(20 - 14 + 1)}{(20 + 1 + 1)^2 (20 + 1 + 1 + 1)} \approx 0.009.\]

Accoriding to this result, the posterior obtained by simulation is a good enough approximation of the true (theoretical) posterior.

Lastly, let’s visually compare the simulated posterior to the theoretical posterior.

hist(sim_post$post_theta, yaxt = 'n', col = 'gray', nclass = 20, freq = FALSE,
     main = '', xlab = expression(paste(theta, '|', y)), ylab = '')
curve(dbeta(x, 15, 7), from = 0, to = 1,  add = TRUE, col = 'red', lwd = 2)

In this figure, the histogram is the posterior distribution sampled by MH algorithm. The red curve is the density of the beta distribution $\mbox{Beta}(15, 7)$, which is the true posterior distribution of $\theta$ in this problem. Again, the simulated posterior looks good.

Exercises

Prolem sets

Q1. Choosing a T-shirt color

(1)

K <- matrix(c(.2, .2, .6,
              .1, .6, .3,
              .3, .5, .2),
            byrow = TRUE, nrow = 3)
pi_1 <- c(.3, .2, .5)

(pi_2 <- pi_1 %*% K)

##      [,1] [,2] [,3]
## [1,] 0.23 0.43 0.34

(pi_3 <- pi_2 %*% K)

##       [,1]  [,2]  [,3]
## [1,] 0.191 0.474 0.335

pi <- c(0, 0, 0)
pi_new <- pi_1
while (sum(abs(pi - pi_new)) > 1e-10) {
    pi <- pi_new
    pi_new <- pi %*% K
}
pi_new

##           [,1]      [,2]      [,3]
## [1,] 0.1827957 0.4946237 0.3225806

Check if it is really invariant.

(pi_check <- pi_new %*% K)

##           [,1]      [,2]      [,3]
## [1,] 0.1827957 0.4946237 0.3225806

(pi_check <- pi_check %*% K)

##           [,1]      [,2]      [,3]
## [1,] 0.1827957 0.4946237 0.3225806

The probability distribution no longer changes!

Q2. Fish for dinner (Toyoda 2015: p.64)

The likelihood can be written as \[y_i | \theta \sim \mbox{Poisson}(\theta).\]

Data $y$ is the vector:

y <- c(0, 1, 0, 0, 2, 0, 1, 0, 0, 1)
(n <- length(y))

## [1] 10

(1)

Using Gamma distribution as natural conjugate prior, the prior of $\theta$ is \[\theta \sim \mbox{Gamma}(\alpha, \beta),\] where $\alpha$ is the shape parameter, and $\beta$ is the inverse scale (or rate) parameter. Then, the expected value is \[\mathrm{E}(\theta) = \frac{\alpha}{\beta},\] and the variance is \[\mathrm{Var}(\theta) = \frac{\alpha}{\beta^2}.\] Thus, \[\frac{\alpha}{\beta} = 2,\] and \[\frac{\alpha}{\beta^2} = 0.8^2.\] Solving the equations, we get \[(\alpha, \beta) = (6.25, 3.125).\]

Since Gamma is the natural conjugate prior for Poisson, the posterior distribtuion also follows a Gamma distribution: \[\theta | y \sim \mbox{Gamma}\left( \alpha + \sum_i y_i, \beta + n \right).\]

The $\sum_i y_i$ is

sum(y)

## [1] 5

Therefore, the posterior of $\theta$ is \[\theta | y \sim \mbox{Gamma}(11.25, 13.125).\]

Let’s sample 1,000 values of $\theta$ from this distribution.

post_theta_1 <- rgamma(1000, shape = 11.25, rate = 13.125)
p1 <- ggplot(data.frame(theta = post_theta_1), aes(x = theta, y = ..density..)) +
    geom_histogram(color = 'black') +
    labs(x = expression(paste(theta, '|', y)))
print(p1)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The mode of this Gamma distribution is \[\mathrm{mode}(\theta | y) = \frac{11.25 - 1}{13.125} \approx 0.78.\] When $\theta = 0.78$, the probability that Mr. T gets no fish is

ppois(0, 0.78)

## [1] 0.458406

Using the median of our sampled values,

ppois(0, median(post_theta_1))

## [1] 0.433541

Therefore, the wife has to prepare some other dishes for dinner with the probability around 0.45.

(2)

Let $q(\theta|y)$ denote the non-normalized posterior of our target posterior $p(\theta|y$): \[p(\theta | y) \propto p(y|\theta) p(\theta) = q(\theta | y).\] The likelihood is \[p(y|\theta) \propto \prod =theta^{y_i} e^{-\theta} = \theta^{\sum y_i} e^{-n\theta},\] and the Gamma prior is \[p(\theta) \propto \theta^{\alpha-1} e^{-\beta \theta} = \theta^{6.25 - 1} e^{-3.125\theta}.\] Therefore, \[q(\theta |y) = \theta^{11.25 - 1} e^{-13.125\theta} = \theta^{10.25} e^{-13.125\theta}.\]

In R, this non-normalized posterior is

nonnorm_dens <- function(theta) {
    if (theta < 0) return(0)
    else return(theta^10.25 * exp(-13.125*theta))
}

Using $\mbox{Normal}(1, 0.5)$ (0.5 is the variance) as the proposal distribution, simulate the posterior by the independent MH algorithm.

fish_sim <- function(n_sims = 100, init = runif(1, 0.5, 1.5)) {
    current <- init
    res_df <- data.frame(time = 1:n_sims,
                         post_theta = rep(NA, n_sims))
    res_df$post_theta[1] <- current
    for (t in 2:n_sims) {
        prop <- rnorm(1, 1, sqrt(.5))
        log_r <- log(nonnorm_dens(prop)) - log(dnorm(prop, 1, sqrt(.5))) -
            (log(nonnorm_dens(current)) - log(dnorm(current, 1, sqrt(.5))))
        prob_move <- min(1, exp(log_r))
        current <- ifelse(runif(1, 0, 1) < prob_move, prop, current)
        res_df$post_theta[t] <- current
    }
    return(res_df)
}
set.seed(2016-11-05)
sim_1 <- fish_sim(n_sims = 100000)

Let’s draw a trace plot.

plot(post_theta ~ time, data = sim_1, type = 'l')

Using the first 5,000 sample, the trace plot is

plot(post_theta ~ time, data = filter(sim_1, time < 5001), type = 'l')

It seems that the chain converge before $t=$5000. So let’s discard the first 4000 samples as burn-in. Then, the posterior will be

post_theta_2 <- sim_1$post_theta[-(1:4000)]
p2 <- p1 %+% data.frame(theta = post_theta_2)
print(p2)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

With the median of this distribution, the probability of no fish is

ppois(0, median(post_theta_2))

## [1] 0.4347017

So, we got the coclusion similar to (1).

(3)

Let’s overimpose the theoretical density obtained in (1) on the simulated distribution in (2).

hist(post_theta_2, yaxt = 'n', col = 'gray', nclass = 20, freq = FALSE,
     main = '', xlab = expression(paste(theta, '|', y)), ylab = '')
curve(dgamma(x, 11.25, 13.125), from = 0, to = 2.5,
      add = TRUE, col = 'red', lwd = 2)

As this figure shows, the simulated distribution obtained by MH algorithm matches the theoretically derived posterior distribution.

Back to Class Materials

Research Methods in Political Science II

Day 2-1: Introduction to Markov chain Monte Carlo

Yuki Yanai

November 5, 2016