Load some packages.
library('mvtnorm')
library('dplyr')
library('ggplot2')
Suppose there are seven inhabited islands. Let \(\theta = \{1, 2, \dots, 7\}\) denote each island. And suppose that the popualtion of \(\theta\) is \(q(\theta) = \theta\). Since \[\sum q(\theta) \neq 1\], \(q\) is not a probability distribution.
After the politician randomly decides the island to visit in the first period, he will
Let’s simulate this in R.
pol_jump <- function(n_sims = 100, init = sample(1:7, size = 1)) {
pop <- c(0, 1:7, 0) # Assume there exist 0th and 8th islands that are uninhabited
current <- init # the initial island
res_df <- data.frame(time = 1:n_sims,
visited = rep(NA, n_sims))
res_df$visited[1] <- current
for (t in 2:n_sims){
proposed <- current + ifelse(rbinom(1, 1, prob = .5), 1, -1)
prob_move <- min(1, pop[proposed + 1] / pop[current + 1])
current <- ifelse(runif(1, 0, 1) < prob_move, proposed, current)
res_df$visited[t] <- current
}
return(res_df)
}
set.seed(2016-11-04)
res <- pol_jump(n_sims = 10000)
Let’s make a trace plot.
trace1 <- ggplot(res, aes(x = time, y = visited)) +
geom_line() + coord_flip()
print(trace1)
trace2 <- ggplot(res, aes(x = log(time), y = visited)) +
geom_line() + xlim(1, log(500)) + ylim(0, 8) + coord_flip()
print(trace2)
## Warning: Removed 9502 rows containing missing values (geom_path).
Let’s discard the first 5,000 samples as burn-in, and visualize the simulated distribution.
hist <- ggplot(filter(res, time > 5000), aes(x = as.factor(visited))) +
geom_bar(width = .2) +
labs(x = 'island', y = 'number of visits')
print(hist)
We’d like to get the posterior distribution \(p(\theta | y)\), where the likelihood is \[y|\theta \sim \mbox{Bin}(n, \theta),\] and the prior is \[\theta \sim \mbox{Beta}(\alpha, \beta).\] Then, the non-normalized posterior \(q(\theta|y)\) can be written as \[q(\theta|y) \propto p(y | \theta) p(\theta) = \theta^{y + \alpha - 1} (1 -\theta)^{n - y + \beta -1}.\]
When \((n, y, \alpha, \beta) = (20, 14, 1, 1)\), the non-normalized posterior becomes \[q(\theta|y) = \theta^{14} (1 -\theta)^{6}.\]
Using this, let’s sample (simulate) the posterior distribution.
First, define the non-normalized posteior.
nonnorm <- function(theta) {
if (theta < 0 | theta > 1) return(0)
else return(theta^14 * (1 - theta)^6)
}
Now, define the MH algorith. Here we use the random walk \(\mbox{Normal}(0, 0.1^2)\) as the proposal distribution.
bin_sim <- function(n_sims = 100, init = runif(1, 0, 1)) {
current <- init
res_df <- data.frame(time = 1:n_sims,
post_theta = rep(NA, n_sims))
res_df$post_theta[1] <- current
for (t in 2:n_sims) {
prop <- current + rnorm(1, mean = 0, sd = .1)
# use log because the nonnorm can be very small
log_r <- log(nonnorm(prop)) -log(nonnorm(current))
prob_move <- min(1, exp(log_r))
current <- ifelse(runif(1, 0, 1) < prob_move, prop, current)
res_df$post_theta[t] <- current
}
return(res_df)
}
set.seed(2016-11-05)
sim <- bin_sim(n_sims = 10000)
To exmine the behavior of the chain, let’s make a trace plot.
trace <- ggplot(sim, aes(x = time, y = post_theta)) + geom_line()
plot(trace)
Let’s focus on the first half of the chain.
plot(trace %+% filter(sim, time < 5001))
There is no clear indication of non-convergence. Let’s discard the first half of our smaple as burn-in and visualize the posterior distribution.
sim_post <- sim %>% filter(time > 5000)
hist <- ggplot(sim_post, aes(x = post_theta, y = ..density..)) +
geom_histogram(color = 'black') +
labs(x = expression(paste(theta, '|', y)))
print(hist)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Using this sample, the mean and variance are
mean(sim_post$post_theta)
## [1] 0.6750077
var(sim_post$post_theta)
## [1] 0.01023553
Theoretically, the posterior must be \(\mbox{Beta}(y + \alpha, n - y + \beta)\), so the mean and variance should be \[\mathrm{E}(\theta | y) = \frac{y + \alpha}{n + \alpha + \beta} = \frac{14 + 1}{20 + 1 + 1} \approx 0.68,\] \[\mathrm{Var}(\theta|y) = \frac{(y + \alpha)(n-y + \beta)}{(n + \alpha + \beta)^2(n+ \alpha + \beta+1)}=\frac{(14 + 1)(20 - 14 + 1)}{(20 + 1 + 1)^2 (20 + 1 + 1 + 1)} \approx 0.009.\]
Accoriding to this result, the posterior obtained by simulation is a good enough approximation of the true (theoretical) posterior.
Lastly, let’s visually compare the simulated posterior to the theoretical posterior.
hist(sim_post$post_theta, yaxt = 'n', col = 'gray', nclass = 20, freq = FALSE,
main = '', xlab = expression(paste(theta, '|', y)), ylab = '')
curve(dbeta(x, 15, 7), from = 0, to = 1, add = TRUE, col = 'red', lwd = 2)
In this figure, the histogram is the posterior distribution sampled by MH algorithm. The red curve is the density of the beta distribution \(\mbox{Beta}(15, 7)\), which is the true posterior distribution of \(\theta\) in this problem. Again, the simulated posterior looks good.
K <- matrix(c(.2, .2, .6,
.1, .6, .3,
.3, .5, .2),
byrow = TRUE, nrow = 3)
pi_1 <- c(.3, .2, .5)
(pi_2 <- pi_1 %*% K)
## [,1] [,2] [,3]
## [1,] 0.23 0.43 0.34
(pi_3 <- pi_2 %*% K)
## [,1] [,2] [,3]
## [1,] 0.191 0.474 0.335
pi <- c(0, 0, 0)
pi_new <- pi_1
while (sum(abs(pi - pi_new)) > 1e-10) {
pi <- pi_new
pi_new <- pi %*% K
}
pi_new
## [,1] [,2] [,3]
## [1,] 0.1827957 0.4946237 0.3225806
Check if it is really invariant.
(pi_check <- pi_new %*% K)
## [,1] [,2] [,3]
## [1,] 0.1827957 0.4946237 0.3225806
(pi_check <- pi_check %*% K)
## [,1] [,2] [,3]
## [1,] 0.1827957 0.4946237 0.3225806
The probability distribution no longer changes!
The likelihood can be written as \[y_i | \theta \sim \mbox{Poisson}(\theta).\]
Data \(y\) is the vector:
y <- c(0, 1, 0, 0, 2, 0, 1, 0, 0, 1)
(n <- length(y))
## [1] 10
Using Gamma distribution as natural conjugate prior, the prior of \(\theta\) is \[\theta \sim \mbox{Gamma}(\alpha, \beta),\] where \(\alpha\) is the shape parameter, and \(\beta\) is the inverse scale (or rate) parameter. Then, the expected value is \[\mathrm{E}(\theta) = \frac{\alpha}{\beta},\] and the variance is \[\mathrm{Var}(\theta) = \frac{\alpha}{\beta^2}.\] Thus, \[\frac{\alpha}{\beta} = 2,\] and \[\frac{\alpha}{\beta^2} = 0.8^2.\] Solving the equations, we get \[(\alpha, \beta) = (6.25, 3.125).\]
Since Gamma is the natural conjugate prior for Poisson, the posterior distribtuion also follows a Gamma distribution: \[\theta | y \sim \mbox{Gamma}\left( \alpha + \sum_i y_i, \beta + n \right).\]
The \(\sum_i y_i\) is
sum(y)
## [1] 5
Therefore, the posterior of \(\theta\) is \[\theta | y \sim \mbox{Gamma}(11.25, 13.125).\]
Let’s sample 1,000 values of \(\theta\) from this distribution.
post_theta_1 <- rgamma(1000, shape = 11.25, rate = 13.125)
p1 <- ggplot(data.frame(theta = post_theta_1), aes(x = theta, y = ..density..)) +
geom_histogram(color = 'black') +
labs(x = expression(paste(theta, '|', y)))
print(p1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The mode of this Gamma distribution is \[\mathrm{mode}(\theta | y) = \frac{11.25 - 1}{13.125} \approx 0.78.\] When \(\theta = 0.78\), the probability that Mr. T gets no fish is
ppois(0, 0.78)
## [1] 0.458406
Using the median of our sampled values,
ppois(0, median(post_theta_1))
## [1] 0.433541
Therefore, the wife has to prepare some other dishes for dinner with the probability around 0.45.
Let \(q(\theta|y)\) denote the non-normalized posterior of our target posterior \(p(\theta|y\)): \[p(\theta | y) \propto p(y|\theta) p(\theta) = q(\theta | y).\] The likelihood is \[p(y|\theta) \propto \prod =theta^{y_i} e^{-\theta} = \theta^{\sum y_i} e^{-n\theta},\] and the Gamma prior is \[p(\theta) \propto \theta^{\alpha-1} e^{-\beta \theta} = \theta^{6.25 - 1} e^{-3.125\theta}.\] Therefore, \[q(\theta |y) = \theta^{11.25 - 1} e^{-13.125\theta} = \theta^{10.25} e^{-13.125\theta}.\]
In R, this non-normalized posterior is
nonnorm_dens <- function(theta) {
if (theta < 0) return(0)
else return(theta^10.25 * exp(-13.125*theta))
}
Using \(\mbox{Normal}(1, 0.5)\) (0.5 is the variance) as the proposal distribution, simulate the posterior by the independent MH algorithm.
fish_sim <- function(n_sims = 100, init = runif(1, 0.5, 1.5)) {
current <- init
res_df <- data.frame(time = 1:n_sims,
post_theta = rep(NA, n_sims))
res_df$post_theta[1] <- current
for (t in 2:n_sims) {
prop <- rnorm(1, 1, sqrt(.5))
log_r <- log(nonnorm_dens(prop)) - log(dnorm(prop, 1, sqrt(.5))) -
(log(nonnorm_dens(current)) - log(dnorm(current, 1, sqrt(.5))))
prob_move <- min(1, exp(log_r))
current <- ifelse(runif(1, 0, 1) < prob_move, prop, current)
res_df$post_theta[t] <- current
}
return(res_df)
}
set.seed(2016-11-05)
sim_1 <- fish_sim(n_sims = 100000)
Let’s draw a trace plot.
plot(post_theta ~ time, data = sim_1, type = 'l')
Using the first 5,000 sample, the trace plot is
plot(post_theta ~ time, data = filter(sim_1, time < 5001), type = 'l')
It seems that the chain converge before $t=$5000. So let’s discard the first 4000 samples as burn-in. Then, the posterior will be
post_theta_2 <- sim_1$post_theta[-(1:4000)]
p2 <- p1 %+% data.frame(theta = post_theta_2)
print(p2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
With the median of this distribution, the probability of no fish is
ppois(0, median(post_theta_2))
## [1] 0.4347017
So, we got the coclusion similar to (1).
Let’s overimpose the theoretical density obtained in (1) on the simulated distribution in (2).
hist(post_theta_2, yaxt = 'n', col = 'gray', nclass = 20, freq = FALSE,
main = '', xlab = expression(paste(theta, '|', y)), ylab = '')
curve(dgamma(x, 11.25, 13.125), from = 0, to = 2.5,
add = TRUE, col = 'red', lwd = 2)
As this figure shows, the simulated distribution obtained by MH algorithm matches the theoretically derived posterior distribution.