Textbook: McElreath (2016), chapter 7.

Preparation

First, let’s load the packages we are going to use.

library('MASS')
library('dplyr')
library('tidyr')
library('ggplot2')
library('rethinking')

We will use the 2009 HR election data (Asano and Yanai 2013) again for illustration. Load the data set with NA imputed by the mean value.

load('data/HR09-imputed.Rda')

Take a look at the data set.

glimpse(HR09_imp)

## Observations: 1,139
## Variables: 20
## $ year       (int) 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 200...
## $ ku         (fctr) aichi, aichi, aichi, aichi, aichi, aichi, aichi, a...
## $ kun        (int) 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, ...
## $ party      (fctr) DPJ, LDP, JCP, SDP, H, DPJ, LDP, JCP, H, DPJ, LDP,...
## $ name       (fctr) SATO, YUKO, SHINODA, YOSUKE, KIMURA, EMI, HIRAYAMA...
## $ age        (int) 46, 36, 59, 61, 42, 43, 47, 53, 67, 51, 52, 36, 32,...
## $ status     (fctr) challenger, incumbent, challenger, challenger, cha...
## $ nocand     (int) 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...
## $ wl         (fctr) won, lost, lost, lost, lost, won, lost, lost, lost...
## $ rank       (int) 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, ...
## $ previous   (int) 1, 1, 0, 0, 1, 5, 0, 0, 0, 5, 1, 0, 0, 4, 1, 2, 0, ...
## $ vote       (int) 122348, 78691, 14485, 6082, 3352, 162237, 58225, 18...
## $ voteshare  (dbl) 54.4, 35.0, 6.4, 2.7, 1.5, 66.6, 23.9, 7.8, 1.7, 62...
## $ eligible   (int) 369526, 369526, 369526, 369526, 369526, 378272, 378...
## $ turnout    (dbl) 61.96, 61.96, 61.96, 61.96, 61.96, 65.59, 65.59, 65...
## $ exp        (dbl) 3559575, 9617887, 1802968, 1642095, 1772906, 932959...
## $ experience (dbl) 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, ...
## $ expm       (dbl) 3.559575, 9.617887, 1.802968, 1.642095, 1.772906, 9...
## $ exp_m      (dbl) 3.559575, 9.617887, 1.802968, 1.642095, 1.772906, 9...
## $ exp_m_c    (dbl) -2.558606, 3.499706, -4.315213, -4.476086, -4.34527...

Intercation

Experienced vs. Unexperienced Candidates

When we analyzed the HR election data first, we found that the MP experience and the campaign spending affected the vote share of electoral candidates. But are the effects of campaign spending same among experienced and inexperienced candidates?

Let’s consider two simple regression models. In each model, the outcome is the vote share, and the predictor is the campaign spending. One model only contains the inexperienced candidates, and the other only the experienced. Here are the codes to graphically display simple regression using ggplot2.

HR09_imp %>% 
    filter(experience == 0) %>%
    ggplot(aes(x = exp_m, y = voteshare)) +
        geom_point() +
        geom_smooth(method = 'lm') +
        labs(x = 'expenditure (million yen)',
             y = 'vote share (%)',
             title = 'Unexperienced candidates')

HR09_imp %>% 
    filter(experience == 1) %>%
    ggplot(aes(x = exp_m, y = voteshare)) +
        geom_point() +
        geom_smooth(method = 'lm') +
        labs(x = 'expenditure (million yen)',
             y = 'vote share (%)',
             title = 'Experienced candidates')

As these figure reveal, the money has different effects on the vote share conditional on candidates’ experience. We managed to find this by separating the data set into two groups, but we should fit a single model to detect this conditional effect, because we would like to know the uncertainty related to this conditioning too (and for some other reasons explained by McElreath).

How can we do that? We can estimate conditional effects by including interactions in our regression models.

Models with and without Interactions

Before we fit models with interactions, let’s re-fit the model without any interactions.

First, the model with the expenditure only.

\[v_i \sim \mbox{Normal}(\mu_i, \sigma)\] \[\mu_i = \alpha + \beta_m m_i\] \[\alpha \sim \mbox{Normal}(25, 10)\] \[\beta_m \sim \mbox{Normal}(0, 3)\] \[\sigma \sim \mbox{Uniform}(0, 100)\]

model1 <- alist(
    voteshare ~ dnorm(mu, sigma),
    mu <- alpha + beta_m*exp_m_c,
    alpha ~ dnorm(25, 10),
    beta_m ~ dnorm(0, 3),
    sigma ~ dunif(0, 100)
)
fit1 <- map(model1, data = HR09_imp)
precis(fit1)

##         Mean StdDev  5.5% 94.5%
## alpha  26.34   0.48 25.57 27.10
## beta_m  3.07   0.10  2.91  3.22
## sigma  16.13   0.34 15.59 16.67

Next, add the second predictor experience to the model. \[v_i \sim \mbox{Normal}(\mu_i, \sigma)\] \[\mu_i = \alpha + \beta_m m_i + \beta_x x_i\] \[\alpha \sim \mbox{Normal}(25, 10)\] \[\beta_m \sim \mbox{Normal}(0, 3)\] \[\beta_x \sim \mbox{Normal}(0, 10)\] \[\sigma \sim \mbox{Uniform}(0, 100)\]

model2 <- alist(
    voteshare ~ dnorm(mu, sigma),
    mu <- alpha + beta_m*exp_m_c + beta_x*experience,
    alpha ~ dnorm(25, 10),
    beta_m ~ dnorm(0, 3),
    beta_x ~ dnorm(0, 10),
    sigma ~ dunif(0, 100)
)
fit2 <- map(model2, data = HR09_imp)
precis(fit2)

##         Mean StdDev  5.5% 94.5%
## alpha  19.25   0.65 18.20 20.29
## beta_m  1.87   0.12  1.68  2.07
## beta_x 17.74   1.21 15.80 19.68
## sigma  14.79   0.31 14.29 15.28

Let’s compare these models.

compare(fit1, fit2)

##      WAIC pWAIC dWAIC weight    SE  dSE
## fit2 9377   4.6   0.0      1 50.10   NA
## fit1 9573   3.2 195.9      0 45.59 33.2

All the Akaike weights are given to the second model. So the second model seems much better than the first.

Now, let’s include interaction. What we would like to accomplish is to obtain different slopes of money for different groups of candidates defined by their experience.

Hear is a model to do that. \[v_i \sim \mbox{Normal}(\mu_i, \sigma)\] \[\mu_i = \alpha + \gamma_i m_i + \beta_x x_i\] \[\gamma_i = \beta_m + \beta_{mx} x_i\] \[\alpha \sim \mbox{Normal}(25, 10)\] \[\beta_m \sim \mbox{Normal}(0, 3)\] \[\beta_x \sim \mbox{Normal}(0, 10)\] \[\beta_{mx} \sim \mbox{Normal}(0, 10)\] \[\sigma \sim \mbox{Uniform}(0, 100)\]

model3 <- alist(
    voteshare ~ dnorm(mu, sigma),
    mu <- alpha + gamma*exp_m_c + beta_x*experience,
    gamma <- beta_m + beta_mx*experience,
    alpha ~ dnorm(25, 10),
    beta_m ~ dnorm(0, 3),
    beta_x ~ dnorm(0, 10),
    beta_mx ~ dnorm(0, 10),
    sigma ~ dunif(0, 100)
)
fit3 <- map(model3, data = HR09_imp)
precis(fit3)

##          Mean StdDev  5.5% 94.5%
## alpha   27.24   0.66 26.19 28.29
## beta_m   4.78   0.17  4.51  5.04
## beta_x  16.83   1.02 15.20 18.46
## beta_mx -4.60   0.21 -4.94 -4.27
## sigma   12.39   0.26 11.97 12.80

Let’s compare this model with the previous models using WAIC.

compare(fit1, fit2, fit3)

##        WAIC pWAIC dWAIC weight    SE   dSE
## fit3 8978.0   7.3   0.0      1 65.81    NA
## fit2 9377.3   4.7 399.3      0 50.09 43.51
## fit1 9573.1   3.3 595.1      0 45.54 57.01

Based on WAIC, the model with interaction is superior to the models without.

Interpreting Interaction

We have obtained the estimates for the model with interaction. Here are the parameter estimates.

precis(fit3)

##          Mean StdDev  5.5% 94.5%
## alpha   27.24   0.66 26.19 28.29
## beta_m   4.78   0.17  4.51  5.04
## beta_x  16.83   1.02 15.20 18.46
## beta_mx -4.60   0.21 -4.94 -4.27
## sigma   12.39   0.26 11.97 12.80

What does each parameter value (values in Mean column) mean?

Try giving your interpretation for each (except sigma) parameter value!

Is it easy? I don’t think so.

Where is the estimate of \(\gamma\)? Well, we didn’t estimate it. We need to calculate it now. \[\gamma_i = \beta_m + \beta_{mx} x_i = 4.78 - 4.6 x_i.\] Therefore, for the inexperienced candidate (\(x_i=0\)), \[\gamma_i = 4.78 - 4.6 \times 0 = 4.78.\] For the experienced candidates (\(x_i = 1\)), \[\gamma_i = 4.78 - 4.6 \times 1 = 0.18.\] So there is a positive relationship between the campaign spending and the vote share among the inexperienced candidates, but the positive relationship is much weaker—close to zero—for the experienced candidates.

Uncertainty of Interaction Effect

Now we know the MAP of interaction. How about the distribution (or uncertainty)?

As usual, let’s sample from the posterior to show the uncertainty of our estimates.

post <- mvrnorm(10000, mu = coef(fit3), Sigma = vcov(fit3)) %>%
    as.data.frame()
gamma_unexper <- post$beta_m + post$beta_mx*0
gamma_exper <- post$beta_m + post$beta_mx*1

What are the means of the gamma’s?

mean(gamma_unexper)

## [1] 4.774424

mean(gamma_exper)

## [1] 0.1730139

Naturally, they are very close to their MAPs.

Now let’s visualize the distributions of the gamma’s.

gamma <- data.frame(gamma_unexper, gamma_exper)
gamma_long <- gamma %>% gather(experience, value)
ggplot(gamma_long, aes(x = value, color = experience)) +
    geom_density() +
    xlab(expression(gamma))

The gamma among the inexperienced candidates is positive for sure. However, the gamma among the experienced is around zero, and it sometimes takes a negative value. And the distributions of two gammas do not overlap. Though it seems pretty obvious that two distributions differ, we can and should directly calculate the difference to show that they are in fact different.

diff <- gamma_unexper - gamma_exper
sum(diff < 0) / length(diff)

## [1] 0

Among 10,000 samples, there is no single pair of which the slope for the experienced candidates is steeper than that for the inexperienced.

Let’s visualize the distribution of the difference.

ggplot(as.data.frame(diff), aes(x = diff)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Now we can say that the slopes differ between two groups according to our model.

Exercise

What if you think that the effect of experience is conditional on the amount of campaign spending? Build a statistical model, fit the model, and interpret the results with visualization.

Continuous Interactions

Fitting Contious Interactions

We have used a binary variable to gauge the candidates’ experience. However, we have more detailed variable called previous, which counts the number of the previous electoral victories—including the victories of HC elections—of each candidate. Let’s create a model with this variable instead of experience.

First, let’s include it in a model without interaction.

(Below I’ll write some models only in R codes, but you should be able to run the statistical models yourself).

model4 <- alist(
    voteshare ~ dnorm(mu, sigma),
    mu <- alpha + beta_m*exp_m_c + beta_x*previous,
    alpha ~ dnorm(25, 10),
    beta_m ~ dnorm(0, 3),
    beta_x ~ dnorm(0, 10),
    sigma ~ dunif(0, 100)
)
fit4 <- map(model4 , data = HR09_imp)
precis(fit4)

##         Mean StdDev  5.5% 94.5%
## alpha  21.28   0.58 20.35 22.22
## beta_m  2.12   0.11  1.94  2.31
## beta_x  2.90   0.22  2.55  3.25
## sigma  15.01   0.31 14.51 15.51

Let’s compare this model with the model of simple regression whose predictor is only the spending.

compare(fit1, fit4)

##        WAIC pWAIC dWAIC weight    SE   dSE
## fit4 9411.8   4.8   0.0      1 51.25    NA
## fit1 9573.2   3.4 161.5      0 45.58 27.87

Based on WAIC, we should include previous in the model.

Now, let’s include the interaction.

model5 <- alist(
    voteshare ~ dnorm(mu, sigma),
    mu <- alpha + gamma*exp_m_c + beta_x*previous,
    gamma <- beta_m + beta_mx*previous,
    alpha ~ dnorm(25, 10),
    beta_m ~ dnorm(0, 3),
    beta_x ~ dnorm(0, 10),
    beta_mx ~ dnorm(0, 10),
    sigma ~ dunif(0, 100)
)
fit5 <- map(model5, data = HR09_imp)
precis(fit5)

##          Mean StdDev  5.5% 94.5%
## alpha   22.46   0.49 21.68 23.25
## beta_m   3.32   0.11  3.15  3.50
## beta_x   5.05   0.21  4.72  5.38
## beta_mx -0.61   0.03 -0.66 -0.57
## sigma   12.54   0.26 12.12 12.96

Did we improve our prediction by adding the interaction?

compare(fit4, fit5)

##        WAIC pWAIC dWAIC weight    SE   dSE
## fit5 9010.5  10.9   0.0      1 60.28    NA
## fit4 9411.7   4.8 401.3      0 51.14 35.72

Yup. The model with interaction seems better.

Now, let’s interpret the parameter estimates.

coeftab(fit4, fit5)

##         fit4    fit5   
## alpha     21.28   22.46
## beta_m     2.12    3.32
## beta_x     2.90    5.05
## sigma     15.01   12.54
## beta_mx      NA   -0.61
## nobs       1139    1139

So what can we say about the results? Try giving your interpretation to each values above.

Now you know that it’s pretty hard to interpret the result with interaction from tables of estimates. Then, what should we do? The answer must be obvious at this point.

Visualizing Continous Interactions

Since the interaction shows the effect of an explanatory variable conditional on the values of another explanatory variable, we should draw some graphs in different conditions. Let’s follow McElreath’s advice (p. 233) and create a triptych.

The variable previous takes 15 different values, and I arbitrarily choose three of them:0, 3, and 10.

First, create a triptych for the model without interaction.

par(mfrow = c(1, 3))
exp_seq <- c(-6, 19, length.out = 100)
for (p in c(0, 3, 10)) {
    d <- HR09_imp %>% filter(previous == p)
    plot(voteshare ~ exp_m_c, data = d, col = rangi2,
         xlim = c(-6, 19), ylim = c(0, 100),
         main = paste('previous=', p), 
         xlab = 'expenditure (million yen; centered)')
    mu <- link(fit4, data = data.frame(previous = p, exp_m_c = exp_seq))
    mu_mean <- apply(mu, 2, mean)
    mu_PI <- apply(mu, 2, PI, prob = .89)
    lines(exp_seq, mu_mean)
    lines(exp_seq, mu_PI[1, ], lty = 2)
    lines(exp_seq, mu_PI[2, ], lty = 2)
}

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

Because this model doesn’t have interaction, the slope for the three panes are the same. As a result, the lines do not represent the data well.

Next, let’s create a similar set of graphs for the model with interaction.

par(mfrow = c(1, 3))
exp_seq <- c(-6, 19, length.out = 100)
for (p in c(0, 3, 10)) {
    d <- HR09_imp %>% filter(previous == p)
    plot(voteshare ~ exp_m_c, data = d, col = rangi2,
         xlim = c(-6, 19), ylim = c(0, 100),
         main = paste('previous=', p), 
         xlab = 'expenditure (million yen; centered)')
    mu <- link(fit5, data = data.frame(previous = p, exp_m_c = exp_seq))
    mu_mean <- apply(mu, 2, mean)
    mu_PI <- apply(mu, 2, PI, prob = .89)
    lines(exp_seq, mu_mean)
    lines(exp_seq, mu_PI[1, ], lty = 2)
    lines(exp_seq, mu_PI[2, ], lty = 2)
}

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

Now we got different slopes for different conditions. The money increases the vote share for new candidates. The effect diminishes as the number of victories increases. Finally, after a candidate won election many times, the money hurts the electoral fortune (Q: Is this causal?).

Once you create a set of figures representing different conditions, we can interpret the interaction. Without visual displays, interactions are onerous.

Exercise

Create some triptychs to interpret the interaction as the effect of previous conditional on the value of spending.

Back to Class Materials

Research Methods in Political Science II

Day 1-3: Models with Interactions

Yuki Yanai

October 22, 2016

Preparation

Intercation

Experienced vs. Unexperienced Candidates

Models with and without Interactions

Interpreting Interaction

Uncertainty of Interaction Effect

Exercise

Continuous Interactions

Fitting Contious Interactions

Visualizing Continous Interactions

Exercise