準備

まず、必要なパッケージを読み込む。

pacman::p_load(tidyverse,
               broom,
               coefplot,
               AER,
               fontregisterer)
if (.Platform$OS.type == "windows") {# Windows
  my_font <- "Yu Gothic"
} else if (capabilities("aqua")) {   # macOS
  my_font <- "HiraginoSans-W3"
} else {                             # Unix/Linux
  my_font <- "IPAexGothic"
}

theme_set(theme_gray(base_size = 9,
                     base_family = my_font))

今回は、田中隆一 (2015)『計量経済学の第一歩』（有斐閣）で分析されている例をRを使って分析する。この教科書のサポートページからデータのzipファイルをダウンロードして展開し、8_income.csv をプロジェクト内の data フォルダに保存する。

保存できたら、データを読み込む。

myd <- read_csv("data/8_income.csv")

データの中身を確認する。

glimpse(myd)

## Rows: 734
## Columns: 8
## $ exper   <dbl> 19, 19, 19, 19, 23, 23, 23, 23, 9, 9, 9, 12, 12, 12, 17, 17, 1…
## $ exper2  <dbl> 361, 361, 361, 361, 529, 529, 529, 529, 81, 81, 81, 144, 144, …
## $ mbirth  <dbl> 6, 4, 1, 9, 8, 11, 3, 6, 10, 11, 11, 9, 6, 1, 10, 3, 11, 12, 8…
## $ moyeduc <dbl> 9, 12, 9, 9, 9, 12, 9, 9, 13, 12, 12, 12, 9, 13, 9, 12, 12, 12…
## $ payeduc <dbl> 9, 12, 9, 9, 9, 13, 9, 9, 13, 12, 12, 12, 9, 12, 9, 9, 12, 12,…
## $ sibs    <dbl> 1, 1, 1, 1, 3, 2, 2, 6, 1, 2, 0, 2, 4, 2, 1, 2, 1, 2, 2, 1, 1,…
## $ yeduc   <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11…
## $ lincome <dbl> 5.703783, 5.991465, 6.214608, 6.214608, 6.309918, 6.309918, 6.…

8つの変数が含まれていることがわかる。

単回帰モデルの操作変数法

田中 (2015) の例8.1 を実行してみよう。

年収の対数値を修学年数に回帰する以下の単回帰モデルを考える。 \[対数年収 = \beta_0 + \beta_1 修学年数 + U.\]

Rで推定する。

fit_ols <- lm(lincome ~ yeduc, data = myd)
fit_ols |> 
    tidy(conf.int = TRUE)

## # A tibble: 2 × 7
##   term        estimate std.error statistic   p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)   5.39     0.0870      61.9  3.52e-293   5.22      5.56  
## 2 yeduc         0.0554   0.00609      9.09 8.86e- 19   0.0434    0.0673

田中 (2015) と同じ結果が得られた。

修学年数は内生変数であることが疑われるので、操作変数法を使って因果効果を推定したい。田中 (2015) に倣い、操作変数として父親の修学年数 (payeduc) を使う。

操作変数には以下の2つの条件がある。

操作変数は、内生変数（ここでは修学年数）を通じてのみ結果変数に影響する
操作変数と内生変数に相関がある（実践上、強い相関が望まれる）

第1の条件（除外制約 [exclusion restriction] と呼ばれる）は、理論的に仮定するものである。この条件に見合う操作変数法を見つけることが、操作変数法を使う上での最大の難点である。ここでは、父親の修学年数は、子の修学年数を通じてのみ、その子の所得に影響を与えると仮定して分析を進めよう。

第2の条件は、検定によって確かめる。通常は、2段階回帰における第1段階の回帰から \(F\)検定の検定統計量を計算し、その値が10以上であれば第2の条件を満たすと考える。今考えている例における第1段階の回帰は、 \[修学年数 = \alpha_0 + \phi 父親の就学年数 + \epsilon\] である。

これを推定してみよう。

fit_1s_1 <- lm(yeduc ~ payeduc, data = myd)
fit_1s_1 |> 
    tidy(conf.int = TRUE)

## # A tibble: 2 × 7
##   term        estimate std.error statistic   p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)   10.5      0.350       30.0 7.29e-130    9.83     11.2  
## 2 payeduc        0.296    0.0280      10.5 2.66e- 24    0.241     0.351

結果をみると、\(\phi\) の推定値が約0.3であり、父親の修学年数が1年伸びると子の修学年数が平均約0.3年長くなることがわかる。この影響は、有意水準5%で統計的に有意であり、操作変数と内生変数の間に相関がありそうだということがわかる。

\(F\)検定の検定統計量も確かめてみよう。

summary(fit_1s_1)$fstatistic

##    value    numdf    dendf 
## 111.2046   1.0000 732.0000

検定統計量は約111 > 10 なので、第2の条件は満たしていると考える。

以下では、この操作変数を用いて、修学年数が所得に与える影響を推定する。

IVの公式

まず、操作変数の公式を用いて、修学年数が所得に与える影響を推定しよう。操作変数の公式を使うと、求める推定値は、 \[\frac{\mbox{Cov[対数所得, 父親の修学年数]}}{\mbox{Cov}[修学年数, 父親の修学年数]}\] となる。計算してみよう。

with(myd, cov(lincome, payeduc) / cov(yeduc, payeduc))

## [1] 0.02956078

これが操作変数法で得られる、修学年数が対数所得に与える因果効果の推定値である。

2段階最小二乗法

次に、2段階最小二乗法で、係数を推定してみよう。

まず、第1段階の回帰モデルは、 \[修学年数 = \alpha_1 + \phi 父親の修学年数 + e_1\] である。これは上で既に求めた。\(\phi\)の推定値 phi_hat は、

(phi_hat <- coef(fit_1s_1)[2])

##   payeduc 
## 0.2955396

である。

第2段階では、 \[対数所得 = \alpha_2 + \beta_{\mathrm{2SLS}} \hat{D} + e_2\] を推定する。\(\hat{D}\) は、第1段階の推定で得られる予測値なので、

myd$D_hat <- predict(fit_1s_1)

である。これを使って推定すると、

fit_2s <- lm(lincome ~ D_hat, data = myd)
fit_2s |> 
    tidy(conf.int = TRUE)

## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   5.75      0.250      23.0  1.46e-88  5.26       6.24  
## 2 D_hat         0.0296    0.0177      1.67 9.46e- 2 -0.00511    0.0642

となる。係数の推定値は約0.0296 であり、IV公式で計算したのと同じ結果が得られた。

AER::ivreg() で推定する

係数の点推定値を得るだけなら、上の方法でもよい。しかし、上の方法だと、正しい標準誤差が求められない。そのため、区間推定や統計的検定が行えない。したがって、実際のデータ分析では、上で実行した方法は使わない。代わりに、AER::ivreg() 関数を使って、操作変数法を実行する。

今考えている例の場合、次のように実行できる。

fit_iv_1 <- ivreg(lincome ~ yeduc | payeduc, data = myd)
fit_iv_1 |> 
    tidy(conf.int = TRUE)

## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   5.75      0.240      23.9  5.86e-94  5.28       6.22  
## 2 yeduc         0.0296    0.0170      1.74 8.21e- 2 -0.00371    0.0628

他の方法で推定したときと同じく、0.02956 という推定値が得られた。また、標準誤差は 0.01698 であることがわかる。操作変数を使わない回帰 (fit_ols) では、修学年数が対数所得を増やすように見えていた（係数の推定値が0.06 で \(p\)値 < 0.01）が、内生性により傾きパラメタを過大推定していた可能性が示唆される。

重回帰モデルの操作変数法

田中 (2015) の例8.2をRで実行してみよう。

操作変数を使わない以下の重回帰モデル（Mincer方程式）を考える。 \[対数所得 = \beta_0 + \beta_1 修学年数 + \beta_2 就業可能年数 + \beta_3 (就業可能年数)^2 + U.\]

このモデルを推定する。

fit_mincer <- lm(lincome ~ yeduc + exper + exper2, data = myd)
fit_mincer |> 
    tidy(conf.int = TRUE)

## # A tibble: 4 × 7
##   term         estimate std.error statistic   p.value conf.low conf.high
##   <chr>           <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)  4.32      0.139        31.1  5.18e-136  4.05     4.59    
## 2 yeduc        0.0842    0.00624      13.5  3.14e- 37  0.0720   0.0965  
## 3 exper        0.0577    0.0153        3.77 1.75e-  4  0.0276   0.0877  
## 4 exper2      -0.000831  0.000526     -1.58 1.14e-  1 -0.00186  0.000201

ここで、修学年数が内生変数である（他の変数は外生変数である）と考え、父親の修学年数を操作変数として利用した推定を考える。

まず、除外制約については、先ほどと同様に考え、父親の修学年数は子の修学年数を通じてのみ、その子の所得に影響を及ぼすと仮定する。

次に、第1段階の回帰分析を使って、父親の修学年数と子の修学年数に相関があるかどうか確かめる。重回帰の場合、第1段階も第2段階も共変量を使った重回帰を行うので、第1段階の回帰モデルは、 \[修学年数 = \alpha_1 + \phi 父親の修学年数 + \lambda_1 就業可能年数 + \lambda_2 (就業可能年数)^2 + e_1\] である。これを推定する。

fit_1s_2 <- lm(yeduc ~ payeduc + exper + exper2, data = myd)
fit_1s_2 |> 
    tidy(conf.int = TRUE)

## # A tibble: 4 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)  12.0      0.682       17.5  1.09e-57  10.6     13.3    
## 2 payeduc       0.205    0.0272       7.53 1.53e-13   0.151    0.258  
## 3 exper         0.161    0.0874       1.84 6.57e- 2  -0.0105   0.333  
## 4 exper2       -0.0112   0.00299     -3.75 1.88e- 4  -0.0171  -0.00536

結果をみると、\(\phi\) の推定値が約0.2であり、父親の修学年数が1年伸びると子の修学年数が平均約0.2年長くなることがわかる。この影響は、有意水準5%で統計的に有意であり、操作変数と内生変数の間に相関がありそうだということがわかる。

\(F\)検定の検定統計量も確かめてみよう。

summary(fit_1s_2)$fstatistic

##    value    numdf    dendf 
##  84.9145   3.0000 730.0000

検定統計量は約84.9 > 10 なので、第2の条件は満たしていると考える。

この操作変数を用いて、修学年数が所得に与える影響を推定しよう。共変量がある場合、次のようにする。すなわち、内生変数は | の左側に、操作変数は | の右側に、共変量は | の両側に書く。

fit_iv_mincer <- ivreg(lincome ~ yeduc + exper + exper2 | 
                           exper + exper2 + payeduc, 
                       data = myd)
fit_iv_mincer |> 
    tidy(conf.int = TRUE)

## # A tibble: 4 × 7
##   term         estimate std.error statistic  p.value conf.low conf.high
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)  4.45      0.349        12.8  9.26e-34  3.76     5.13    
## 2 yeduc        0.0752    0.0233        3.23 1.30e- 3  0.0296   0.121   
## 3 exper        0.0597    0.0162        3.70 2.34e- 4  0.0281   0.0914  
## 4 exper2      -0.000964  0.000621     -1.55 1.21e- 1 -0.00218  0.000254

田中 (2015: 201) と同じ結果が得られた。

結果を図示しておこう。点推定値と95%信頼区間（厳密には、点推定値 \(\pm\) 2標準誤差）を示す。

cplt_iv_mincer <- coefplot(
    fit_iv_mincer, intercept = FALSE,
    outerCI = 2, lwdOuter = 1,
    newNames = c(yeduc = "修学年数",
                 exper = "就業可能年数",
                 exper2 = "就業可能年数の二乗"),
    xlab = "係数", ylab = "説明変数",
    title = "結果変数は対数所得、操作変数は父親の修学年数")
plot(cplt_iv_mincer)

修学年数が対数所得に与える影響の点推定値は0.075である。つまり、教育の収益率は約7.5%であり、この効果は有意水準5%で統計的に有意である。操作変数を使わない推定 (fit_mincer) では、収益率が約8.4%と推定されていたので、内生性によって効果を過大推定していた可能性が示唆される。

授業の内容に戻る

計量経済学応用

操作変数法

矢内　勇生

2018-05-22（改訂：2019-05-17, 2022-05-24）

準備

単回帰モデルの操作変数法

IVの公式

2段階最小二乗法

AER::ivreg() で推定する

重回帰モデルの操作変数法

計量経済学応用

操作変数法

矢内 勇生

2018-05-22（改訂：2019-05-17, 2022-05-24）

準備

単回帰モデルの操作変数法

IVの公式

2段階最小二乗法

AER::ivreg() で推定する

重回帰モデルの操作変数法

矢内　勇生