Load packages to use.
library('ggplot2')
library('dplyr')
library('rstan')
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library('rethinking')
Define the inverse of the logit function (i.e., the logistic function).
invlogit <- function(x) {
p <- 1 / (1 + exp(-x))
return(p)
}
For illustration, we use the dataset appeared in Asano and Yanai (2013): HR-labelled.Rda.
Using this dataset, we will try to explain the victory in single member districts (SMDs) of the Japanese House of Representatives elections by logistic regression.
First, load the dataset into R. Here, let’s use the dataset we modified (labelled) and saved before.
load(url('http://yukiyanai.github.io/classes/rm2-2016/contents/data/HR-labelled.Rda'))
glimpse(HR)
## Observations: 5,614
## Variables: 16
## $ year <int> 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996...
## $ ku <fctr> aichi, aichi, aichi, aichi, aichi, aichi, aichi, ai...
## $ kun <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3...
## $ party <fctr> NFP, LDP, DPJ, JCP, bunka, NP, independent, NFP, LD...
## $ name <fctr> KAWAMURA, TAKASHI, IMAEDA, NORIO, SATO, TAISUKE, IW...
## $ age <int> 47, 72, 53, 43, 51, 51, 45, 51, 71, 30, 31, 44, 61, ...
## $ status <fctr> incumbent, ex, incumbent, challenger, challenger, c...
## $ nocand <int> 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7...
## $ wl <fctr> won, lost, lost, lost, lost, lost, lost, won, lost,...
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3...
## $ previous <int> 2, 3, 2, 0, 0, 0, 0, 2, 1, 1, 0, 0, 0, 0, 0, 1, 3, 1...
## $ vote <int> 66876, 42969, 33503, 22209, 616, 566, 312, 56101, 44...
## $ voteshare <dbl> 40.0, 25.7, 20.1, 13.3, 0.4, 0.3, 0.2, 32.9, 26.4, 2...
## $ eligible <int> 346774, 346774, 346774, 346774, 346774, 346774, 3467...
## $ turnout <dbl> 49.22, 49.22, 49.22, 49.22, 49.22, 49.22, 49.22, 51....
## $ exp <int> 9828097, 9311555, 9231284, 2177203, NA, NA, NA, 1294...
This dataset contains the variable showing the election results, wl. The values for the variable are “lost”, “won”, and “zombie” (candidates who lost the SMD but resurrected in a PR block). Using this variable, let’s make the binary variable indicating the victory in an SMD, wlsmd.
myd <- HR %>% mutate(wlsmd = as.numeric(wl == 'won'))
with(myd, table(wlsmd))
## wlsmd
## 0 1
## 4114 1500
In addition, let’s create three dummy variables ldp (Liberal Democratic Party), komei (New Komeito or Clean Government Party), dpj (Democratic Party of Japan), and a variable measuring the expenditure in million yen.
myd <- myd %>%
mutate(ldp = as.numeric(party == 'LDP'),
komei = as.numeric(party == 'CGP'),
dpj = as.numeric(party == 'DPJ'),
expm = exp / 10^6)
For simpliciy, keep the data for the 2009 election and delete the observations with a missing value (You shouldn’t delete any observations because of missing values in a real data analysis).
myd_2009 <- myd %>%
filter(year == 2009) %>%
na.omit()
Now we’d like to model the victory in SMDs (\(v\)) with two predictors. One is the number of privious wins in the HR elections , previous (\(p\)); the other is the elecotral expenditure in million yen, expm (\(x\)). (In a research paper, we should explicity explain why we use this model. In other words, we should provide theoretical reasons why we use these predictors to explain the outcome. However, we focus on the quantitative methods here.)
Our model can be written as follows. \[v_n \sim \mbox{Bernoulli}(\theta_n),\] \[\mathrm{logit}(\theta_n) = \alpha + \beta_1 p_n + \beta_2 x_n,\] \[\alpha \sim \mbox{Normal}(0, 100^2),\] \[\beta_k \sim \mbox{Normal}(0, 10^2),\] where \(n = 1, 2, \dots, N\), the \(N\) is the number of observations, and \(k \in \{1, 2\}\)
If you use rethinking::map2stan()
, you need to specify the map2stan model. Here, we directly use RStan. So we translate the model above into a Stan model and save it as d2-model2.stan.
data {
int<lower=0> N;
int<lower=0, upper=1> V[N];
int<lower=0> P[N];
real<lower=0> X[N];
}
parameters {
real alpha;
real beta[2];
}
transformed parameters {
real<lower=0, upper=1> theta[N];
for (n in 1:N) {
theta[n] = inv_logit(alpha + beta[1]*P[n] + beta[2]*X[n]);
}
}
model {
for (n in 1:N) {
V[n] ~ bernoulli(theta[n]);
}
alpha ~ normal(0, 100);
for (k in 1:2) {
beta[k] ~ normal(0, 10);
}
}
generated quantities {
int<lower=0, upper=1> v_pred[N];
real log_lik[N];
for (n in 1:N) {
v_pred[n] = bernoulli_rng(theta[n]);
log_lik[n] = bernoulli_lpmf(V[n] | theta[n]);
}
}
We generate the predicte values in the last block. In addition, we calculate the log liklihood here so that we can compute WAIC later.
Fit the model.
myd4stan_1 <- list(N = nrow(myd_2009), V = myd_2009$wlsmd,
P = myd_2009$previous, X = myd_2009$expm)
stan_m2 <- stan_model('stan-models/d2-model2.stan')
logit_1 <- sampling(stan_m2, data = myd4stan_1, seed = 222,
chains = 4, iter = 3000, warmup = 1500)
print(logit_1)
(Output is omitted here because it is very long).
To focus on the specific paramter values, add pars argument.
print(logit_1, pars = c('alpha', 'beta'))
## Inference for Stan model: d2-model2.
## 4 chains, each with iter=3000; warmup=1500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha -2.04 0 0.13 -2.29 -2.13 -2.04 -1.95 -1.79 2796 1
## beta[1] 0.36 0 0.04 0.29 0.34 0.36 0.39 0.44 2899 1
## beta[2] 0.04 0 0.02 0.00 0.03 0.04 0.05 0.07 2622 1
##
## Samples were drawn using NUTS(diag_e) at Fri Nov 18 22:30:01 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
plot(logit_1, pars = 'beta')
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
This result shows that \[\mathrm{Pr(victory)} = \frac{1}{1 + \exp[- (-2.04 + 0.36 \mathrm{previous~wins} + 0.04\mathrm{expenditure})]}.\]
Next, let’s add the candidate’s age (\(a\)) to the linear predictor. Our model can be written as follows. \[v_n \sim \mbox{Bernoulli}(\theta_n),\] \[\mathrm{logit}(\theta_n) = \alpha + \beta_1 p_n + \beta_2 x_n + \beta_3 a_n,\] \[\alpha \sim \mbox{Normal}(0, 100),\] \[\beta_k \sim \mbox{Normal}(0, 10^2),\] where \(n = 1, 2, \dots, N\), the \(N\) is the number of observations, and \(k \in \{1, 2, 3\}\).
The stan file for this model is d2-model3.stan. Let’s fit the model.
myd4stan_2 <- c(myd4stan_1, list(A = myd_2009$age))
stan_m3 <- stan_model('stan-models/d2-model3.stan')
logit_2 <- sampling(stan_m3, data = myd4stan_2, seed = 333,
chains = 4, iter = 3000, warmup = 1500)
Here is the result.
print(logit_2)
(Output is omitted here, but you should confirm that Rhats are all equal to 1.)
print(logit_2, pars = c('alpha', 'beta'))
## Inference for Stan model: d2-model3.
## 4 chains, each with iter=3000; warmup=1500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 0.77 0.01 0.39 0.01 0.51 0.77 1.03 1.53 1628 1
## beta[1] 0.52 0.00 0.05 0.43 0.49 0.52 0.55 0.61 2088 1
## beta[2] 0.04 0.00 0.02 0.01 0.03 0.04 0.06 0.08 2411 1
## beta[3] -0.06 0.00 0.01 -0.08 -0.07 -0.06 -0.06 -0.05 1560 1
##
## Samples were drawn using NUTS(diag_e) at Fri Nov 18 22:30:57 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
plot(logit_2, pars = 'beta')
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
This shows that \[\mathrm{Pr(victory)} = \frac{1}{1 + \exp[-(0.77 + 0.52 \mathrm{previous~wins} + 0.04 \mathrm{expenditure} -0.06 \mathrm{age})]}.\]
Furthermore, let’s add dummies for LDP, CGP (Komei), and DPJ. The model can be written as follows. \[v_n \sim \mbox{Bernoulli}(\theta_n),\] \[\mathrm{logit}(\theta_n) = \alpha + \beta_1 p_n + \beta_2 x_n + \beta_3 a_n + \beta_4 LDP_n + \beta_5 CGP_n + \beta_6 DPJ_n,\] \[\alpha \sim \mbox{Normal}(0, 100),\] \[\beta_k \sim \mbox{Normal}(0, 10^2),\] where \(n = 1, 2, \dots, N\), the \(N\) is the number of observations, and \(k \in \{1, 2, \dots, 6\}\). You can fit this model with RStan.
However, note that \(\beta_4\), \(\beta_5\), and \(\beta_6\) simply change the intercept depending on the value of dummy variables. Thus, we can rewrite the model as follows. \[v_n \sim \mbox{Bernoulli}(\theta_n),\] \[\mathrm{logit}(\theta_n) = \alpha_{party_n} + \beta_1 p_n + \beta_2 x_n + \beta_3 a_n,\] \[\alpha_{party} \sim \mbox{Normal}(0, 100),\] \[\beta_k \sim \mbox{Normal}(0, 10^2),\] where \(n = 1, 2, \dots, N\), the \(N\) is the number of observations, \(k \in \{1, 2, 3\}\), and party \(\in \{LDP, CGP, DPJ, other\}\). The Stan model of this is d2-model4.stan.
To fit this model, we need to prepare the new party variable that categorize each candite into one of {LDP, CGP, DPJ, other}.
myd_2009 <- myd_2009 %>%
mutate(party4 = 4 - dpj - 2*komei - 3*ldp)
with(myd_2009, table(party, party4))
## party4
## party 1 2 3 4
## independent 0 0 0 60
## JCP 0 0 0 152
## LDP 290 0 0 0
## CGP 0 6 0 0
## oki 0 0 0 0
## tai 0 0 0 0
## saki 0 0 0 0
## NFP 0 0 0 0
## DPJ 0 0 268 0
## SDP 0 0 0 31
## LF 0 0 0 0
## NJSP 0 0 0 0
## DRF 0 0 0 0
## kobe 0 0 0 0
## nii 0 0 0 0
## sei 0 0 0 0
## JNFP 0 0 0 0
## bunka 0 0 0 0
## green 0 0 0 0
## LP 0 0 0 0
## RC 0 0 0 0
## muk 0 0 0 0
## CP 0 0 0 0
## NCP 0 0 0 0
## ND 0 0 0 0
## son 0 0 0 0
## sek 0 0 0 0
## NP 0 0 0 0
## NNP 0 0 0 9
## NPJ 0 0 0 2
## NPD 0 0 0 0
## minna 0 0 0 14
## R 0 0 0 1
## H 0 0 0 291
## others 0 0 0 0
We assigned values 1 through 4 to LDP, CGP, DPJ, and other parties, respectively.
Now let’s fit the model.
myd4stan_3 <- c(myd4stan_2, list(PARTY = myd_2009$party4))
stan_m4 <- stan_model('stan-models/d2-model4.stan')
logit_3 <- sampling(stan_m4, data = myd4stan_3, seed = 444,
chains = 4, iter = 3000, warmup = 1500)
## Warning: There were 284 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
print(logit_3)
(Output is omitted.)
print(logit_3, pars = c('alpha', 'beta'))
## Inference for Stan model: d2-model4.
## 4 chains, each with iter=3000; warmup=1500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
## alpha[1] -2.08 0.02 0.69 -3.43 -2.55 -2.07 -1.61 -0.70 1041
## alpha[2] -81.07 1.87 57.06 -217.71 -114.46 -69.61 -36.54 -6.86 930
## alpha[3] 1.71 0.02 0.59 0.55 1.32 1.71 2.08 2.90 933
## alpha[4] -2.54 0.02 0.63 -3.78 -2.95 -2.53 -2.11 -1.34 1001
## beta[1] 0.49 0.00 0.06 0.38 0.45 0.49 0.53 0.61 1476
## beta[2] 0.04 0.00 0.03 -0.02 0.02 0.04 0.06 0.10 2099
## beta[3] -0.03 0.00 0.01 -0.06 -0.04 -0.03 -0.03 -0.01 944
## Rhat
## alpha[1] 1
## alpha[2] 1
## alpha[3] 1
## alpha[4] 1
## beta[1] 1
## beta[2] 1
## beta[3] 1
##
## Samples were drawn using NUTS(diag_e) at Fri Nov 18 22:32:19 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
plot(logit_3, pars = c('alpha', 'beta'))
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
We can see that the estimate of alpha[2] has by far the largest HPDI. This is not surprising given the number of observations we have for alpha[2], which is the number of candiates from CGP (Komei-to).
Write an equation representing the estimated results yourself.
Let’s compare the three models we fit above.
compare(logit_1, logit_2, logit_3)
## WAIC pWAIC dWAIC weight SE dSE
## logit_3 597.6 6.8 0.0 1 37.09 NA
## logit_2 1020.5 4.4 422.9 0 36.78 37.23
## logit_1 1077.9 3.3 480.3 0 35.68 39.02
Comparing three models above, we find that WAIC is smallest in the third model, and the Akaike weight is assigned only to the third model. This implies that the logit_3 is the best among three models (Don’t forget that this model can be very far from the truth. We merely compare three models of infinitely many possiblilities).
Now, let’s calculate the “hit ratio” of each model. Since we got the predicted probabilites, let’s use the mean of these values to calculate the hit ratio, by assuming that the probability larger than 0.5 implies victory.
sample_1 <- rstan::extract(logit_1)
sample_2 <- rstan::extract(logit_2)
sample_3 <- rstan::extract(logit_3)
myd_2009 <- myd_2009 %>%
mutate(pred_1 = apply(sample_1$theta, 2, mean),
pred_2 = apply(sample_2$theta, 2, mean),
pred_3 = apply(sample_3$theta, 2, mean))
Using these predicted probabilities, we’ll calculate the “hit ratio”.
## Error rates
ert_1 <- with(myd_2009, mean((pred_1 > 0.5 & wlsmd == 0) | (pred_1 < 0.5 & wlsmd == 1)))
ert_2 <- with(myd_2009, mean((pred_2 > 0.5 & wlsmd == 0) | (pred_2 < 0.5 & wlsmd == 1)))
ert_3 <- with(myd_2009, mean((pred_3 > 0.5 & wlsmd ==0 ) | (pred_3 < 0.5 & wlsmd == 1)))
## Hit rate = 1 - error rate
1 - c(ert_1, ert_2, ert_3)
## [1] 0.7473310 0.7704626 0.8825623
The hit ratio is the highest in the third model.
The observed ratio of victories is
(wr <- with(myd_2009, mean(wlsmd)))
## [1] 0.2651246
Thus, the baseline of prediction is
max(c(wr, 1 - wr))
## [1] 0.7348754
That is, without a model, we can achieve 73% hit by predicting everybody loses. The prediction based on Models 1 and 2 are only slightly better than the baseline, and we consider them as “bad” fits. In contrast, Model 3 improves the accuracy of prediction.
Next, we’ll compare the ROC curves of these three models. We define the functions to calculate the true positive rate (TPR, sensitivity) and the false positive rate (FPR, \(1 -\) specificity) and draw ROC curves with ggplot2.
## Function to calculate the true positive rate (sensitivity)
true_pos <- function(predicted, observed, cutoff = .5) {
n <- length(cutoff)
res <- rep(NA, n)
for (i in 1:n) {
cond_met <- predicted >= cutoff[i] & observed == 1
res[i] <- mean(cond_met) / mean(observed)
}
return(res)
}
## Function to calculate the false positive rate(1- specificity)
false_pos <- function(predicted, observed, cutoff = .5) {
n <- length(cutoff)
res <- rep(NA, n)
for (i in 1:n) {
cond_met <- predicted >= cutoff[i] & observed == 0
res[i] <- mean(cond_met) / (1 - mean(observed))
}
return(res)
}
## Get TPRs and FPRs by changing the cutoff from 0 to 1
cutoff_vec <- seq(0, 1, length = 1000)
tp_1 <- true_pos(myd_2009$pred_1, myd_2009$wlsmd, cutoff = cutoff_vec)
tp_2 <- true_pos(myd_2009$pred_2, myd_2009$wlsmd, cutoff = cutoff_vec)
tp_3 <- true_pos(myd_2009$pred_3, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_1 <- false_pos(myd_2009$pred_1, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_2 <- false_pos(myd_2009$pred_2, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_3 <- false_pos(myd_2009$pred_3, myd_2009$wlsmd, cutoff = cutoff_vec)
## Create data frame to use ggplot2
df_roc <- data.frame(tpr = c(tp_1, tp_2, tp_3),
fpr = c(fp_1, fp_2, fp_3),
model = rep(1:3, rep(length(cutoff_vec), 3)))
## Draw ROC cuves
p <- ggplot(df_roc, aes(x = fpr, y = tpr, color = as.factor(model))) +
geom_line() +
geom_abline(intercept = 0, slope = 1, linetype = 'dashed') +
scale_color_discrete(name = '', labels = paste('Model', 1:3)) +
labs(x = '1 - specificity', y = 'sensitivity', title = 'ROC Curves')
print(p)
This figure also shows that the third models is the best among three.
The residual is the difference between the observed value and the predicted value, \[\mathrm{residual}_i = y_i - \mathrm{logit}^{-1}({\bf x}_i^T {\bf \beta}).\] Since the response \(y_i\) takes only two different values 0 and 1, a spcific \({\bf x}_i^T {\bf \beta}\) gives us only two different residuals. For instance, for \(\mathrm{logit}^{-1}({\bf x}_i^T {\bf \beta})=0.4\), the possible residuals are \(-0.4\) and \(0.6\) only.
Plotting the residuals for Model 3, we get the following.
myd_2009$res_3 <- with(myd_2009, wlsmd - pred_3)
res_p <- ggplot(myd_2009, aes(x = pred_3, y = res_3)) +
geom_point() +
geom_hline(yintercept = 0, color = 'royalblue') +
labs(x = 'predicted prob. of victory', y = 'observed - predicted',
title = 'Residual Plot of Model 3')
print(res_p)
As this figure shows, the residual plot of logistic regression is (almost) useless. And so, we create binned residual plots instead of ordinary residual plots. To make this type of plot, we group the residuals by predicted values and calculate the group mean of the residuals. Then, we plot the mean residual versus the mean predicted value for each group.
Let’s create one for Model 3.
## We make 20 groups based on predicted values: num of groups is arbitrary
## Create a data frame with the predicted values and residuals
df <- myd_2009[, c('pred_3', 'res_3')]
## Sort the observations by predicted values
df <- arrange(df, pred_3)
N <- nrow(df)
n_group <- round(N / 20) ## Observations in each group
## If N / 20 is not an integer, the nuber of observations differ in the last group
df$group <- rep(20, N) ## Category var for groups
df$group[1:(n_group * 20)] <- rep(1:20, rep(n_group, 20))
## Caculate the group means of predicted values and residuals
df_group <- df %>%
group_by(group) %>%
summarize(pred = mean(pred_3),
res = mean(res_3),
obs = n())
## Calculate +-2se
df_group <- df_group %>%
mutate(lb = -2 * sqrt(pred * (1 - pred) / obs),
ub = 2 * sqrt(pred * (1 - pred) / obs))
## Plot
p_br <- ggplot(df_group, aes(x = pred, y = res)) +
geom_point() +
geom_line(aes(y = lb), linetype = 'dotted') +
geom_line(aes(y = ub), linetype = 'dotted') +
geom_hline(yintercept = 0, color = 'royalblue') +
labs(x = 'predicted probability of victory', y = 'mean residual',
title = 'Binned Residual Plot of Model 3')
print(p_br)
This figure shows the binned residual plot with 20 groups for Model 3. The number of groups 20 was arbitrarily chosen. We should determine the number of groups so that the number of observatios in each group is not too small. Here, we have 1,124 observations in total and each group has 56 observations except the 20th group, which has 60.
Looking at the figure, we cannot find an obvious pattern of the residuals. However, we are not so sure because the number of points are as small as 20. (Assignment: Create a similar figure with 40 groups and check if you can find any pattern.) The curves in the figure display \(\pm2\)se\(=\pm 2\sqrt{p(1-p)/n}\), where \(p\) is the predicted probability for each group, and \(n\) is the number of observations for each group. We expect that about 95% of observations are between two curves. In this figure, 18 out of 20 points are in that area. That is, 90% of the data are between the curves, which is not off the expectation much.
Since the relationship between the outcome and the linear predictor is not linear in logistic regression, it is not trivial to interpret the estimated coeffcients. One way to tackle the problem is to consider how much the predicted probability changes corresponding to a specific change of each predictor. For illustration we use Model 2 estimated above.
Here is the result of Model 2.
print(logit_2, pars = c('alpha', 'beta'))
## Inference for Stan model: d2-model3.
## 4 chains, each with iter=3000; warmup=1500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 0.77 0.01 0.39 0.01 0.51 0.77 1.03 1.53 1628 1
## beta[1] 0.52 0.00 0.05 0.43 0.49 0.52 0.55 0.61 2088 1
## beta[2] 0.04 0.00 0.02 0.01 0.03 0.04 0.06 0.08 2411 1
## beta[3] -0.06 0.00 0.01 -0.08 -0.07 -0.06 -0.06 -0.05 1560 1
##
## Samples were drawn using NUTS(diag_e) at Fri Nov 18 22:30:57 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
First, let’s examine the effect of previous wins on the probability of victory. Consider how much the probability changes as the prvious wins increases from 0 to 1, other things equal (fixed). Let \(\delta\) denote the change in the probability of victory. \[\delta(\mathrm{expenditure, age}) = \mathrm{logit}^{-1}(0.77 + 0.52 \cdot 1 + 0.04 \mathrm{expenditure} - 0.06 \mathrm{age})\\ - \mathrm{logit}^{-1}(0.77 + 0.52 \cdot 0 +0.04 \mathrm{expenditure} - 0.06 \mathrm{age}).\]
Calculating this for each observation, we get the mean prediced difference (MPD) \[ \mathrm{MPD} = \frac{1}{n}\sum_{i=1}^n \delta(\mathrm{expenditure}_i, \mathrm{age}_i).\]
Let’s get it in R. We use our own invlogit()
function for calculation.
## Extract the point estimates of parameters
a <- mean(sample_2$alpha)
b <- apply(sample_2$beta, 2, mean)
b <- c(a, b)
## Calculate the predicted difference for each observation
delta <- with(myd_2009,
invlogit(b[1] + b[2] * 1 + b[3] * expm + b[4] * age) -
invlogit(b[1] + b[2] * 0 + b[3] * expm + b[4] * age))
## Calculate the mean predicted difference
mean(delta)
## [1] 0.06086107
This shows that the probability of winning the SMD increases by 6.09 points on average as the previous wins increases from 0 to 1.
Similarly, let’s calculate the change of the probability if the previous wins increases from 1 to 5.
delta <- with(myd_2009,
invlogit(b[1] + b[2] * 5 + b[3] * expm + b[4] * age) -
invlogit(b[1] + b[2] * 1 + b[3] * expm + b[4] * age))
mean(delta)
## [1] 0.4117734
Thus, if the previous wins increases from 1 to 5, the probability of winning increases by 41.18 points on average.
Next, let’s examin the effect of electoral spending. Now, we’ll check how much the winning probability changes as the spending changes from “the mean - sd” to “the mean + sd”.
lo <- with(myd_2009, mean(expm) - 0.5 * sd(expm))
hi <- with(myd_2009, mean(expm) + 0.5 * sd(expm))
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * hi + b[4] * age) -
invlogit(b[1] + b[2] * previous + b[3] * lo + b[4] * age))
mean(delta)
## [1] 0.03346188
This shows that the change of the elecotral spending from “the mean -sd” to the “mean + sd” is associated with the rise of the winning probability by 3.35 points.
Similarly, what happenes if the spending changes from the minimum to the maximum.
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * max(expm) + b[4] * age) -
invlogit(b[1] + b[2] * previous + b[3] * min(expm) + b[4] * age))
mean(delta)
## [1] 0.186329
This shows that the winning probability increases by 18.63 points on average if the expenditure changes from the minimum to the maximum.
Lastly, let’s check the effect of age on the probability of winning the SMDs. How does the probability change if a candidate’s age changes from 30 to 40?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 40) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 30))
mean(delta)
## [1] -0.1118803
It is shown that the probability of elecotal victory increases by -11.19 points (or decreases by 11.19 points) on average by changing the age from 30 to 40.
Simlilary, what is the effect of age changing from 40 to 50?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 50) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 40))
mean(delta)
## [1] -0.09278014
It is presented that the aging from 40 to 50 is associated with the -9.28 points increase (or 9.28 points decrease) of the winning probability.
How about the change from 50 to 60?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 60) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 50))
mean(delta)
## [1] -0.07292753
It is displayed that the aging from 50 to 60 is associated with the -7.29 points increase (or 7.29 points decrease) of the winning probability.
As these examples show, we can substantively interpret the results of logit models (logistic regression) by showing the average change of the predicted probability corresponding to some specific changes of the predictors. It is even better to show these results graphically (e.g. bar graphs).
Do you wanna check if the effect is significant? Really?? Well, here is one way to do that.
Using logit_3, let’s examine if the effect of electoral spending is really positive. Since we have the sample values of the coefficient, we can directly calculate the probability that the effect is positive.
sum(sample_3$beta[, 2] > 0) / length(sample_3$beta[, 2])
## [1] 0.9093333
Let’s visualize this.
hist <- data.frame(beta_x = sample_3$beta[, 2]) %>%
ggplot(aes(x = beta_x, y = ..density..)) +
geom_histogram(color = 'black') +
geom_vline(xintercept = 0, color = 'red') +
labs(x = expression(beta[x]), y = 'posterior density')
print(hist)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
So what do you think?