Práctica 3. Estimación modelos multinivel con lmer en R

Correspondiente a la sesión del viernes, 1 de septiembre de 2023

En esta práctica:

1. Cargar/instalar librerías

if (!require("pacman")) install.packages("pacman") # solo la primera vez
Loading required package: pacman
pacman::p_load(lme4,
               reghelper,
               haven,
               stargazer,
               ggplot2, # gráficos
               texreg, # tablas de regresion (screenreg)
               dplyr # manipulacion de datos
               ) # paquetes a cargar

Para esta práctica utilizaremos la librería lme4, en particular la función lmer, así como también la librería haven, para lectura de base de datos externa. Además stargazer para visualizar outputs.

2. Leer datos base High School and Beyond (HSB)**

Como en sesión anterior

mlm = read_dta("http://www.stata-press.com/data/mlmus3/hsb.dta")

“mlm”” es el nombre que le daremos a la base de datos “High School and Beyond”

Variables relevantes para ejercicios:

  1. Nivel 1:
  • minority, an indicator for student ethnicity (1 = minority, 0 = other)
  • female, an indicator for student gender (1 = female, 0 = male)
  • ses, a standardized scale constructed from variables measuring parental education, occupation, and income
  • mathach, a measure of mathematics achievement
  1. Nivel 2
  • size (school enrollment)
  • sector (1 = Catholic, 0 = public)
  • pracad (proportion of students in the academic track)
  • disclim (a scale measuring disciplinary climate)
  • himnty (1 = more than 40% minority enrollment, 0 = less than 40%)
  • mnses (mean of the SES values for the students in this school who are included in the level-1 file)
  1. Cluster variable: schoolid

Seleccionar variables

mlm=mlm %>% select(minority,female,ses,mathach,size,sector,pracad,disclim,himinty,mnses,schoolid) %>% as.data.frame()
names(mlm)
 [1] "minority" "female"   "ses"      "mathach"  "size"     "sector"  
 [7] "pracad"   "disclim"  "himinty"  "mnses"    "schoolid"
#Tabla estadisticos descriptivos
stargazer(mlm,title="Estadísticos descriptivos", type = "text")

Estadísticos descriptivos
=================================================
Statistic   N     Mean    St. Dev.   Min    Max  
-------------------------------------------------
minority  7,185   0.275     0.446     0      1   
female    7,185   0.528     0.499     0      1   
ses       7,185  0.0001     0.779   -3.758 2.692 
mathach   7,185  12.748     6.878   -2.832 24.993
size      7,185 1,056.862  604.172   100   2,713 
sector    7,185   0.493     0.500     0      1   
pracad    7,185   0.534     0.251   0.000  1.000 
disclim   7,185  -0.132     0.944   -2.416 2.756 
himinty   7,185   0.280     0.449     0      1   
mnses     7,185  0.0001     0.414   -1.194 0.825 
schoolid  7,185 5,277.898 2,499.578 1,224  9,586 
-------------------------------------------------

3. Modelos

Modelo 0: Null model

results_0 = lmer(mathach ~ 1 + (1 | schoolid), data = mlm)
summary(results_0)
Linear mixed model fit by REML ['lmerMod']
Formula: mathach ~ 1 + (1 | schoolid)
   Data: mlm

REML criterion at convergence: 47116.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0631 -0.7539  0.0267  0.7606  2.7426 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept)  8.614   2.935   
 Residual             39.148   6.257   
Number of obs: 7185, groups:  schoolid, 160

Fixed effects:
            Estimate Std. Error t value
(Intercept)  12.6370     0.2444   51.71
screenreg(results_0) # de library texreg

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.64 ***
                               (0.24)   
----------------------------------------
AIC                         47122.79    
BIC                         47143.43    
Log Likelihood             -23558.40    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       8.61    
Var: Residual                  39.15    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

En este modelo es posible identificar \(\gamma_{00}\) (intercept - fixed effects), así como también los componentes la varianza:

  • \(\tau_{00}\) (Var:id)
  • \(\sigma^2\) (Var:Residual).

Con estos últimos coeficientes aleatorios es posible calcular la correlación intraclase (\(\rho\)).

Correlación intra-clase = ICC = \(\rho=\frac{\tau_{00}}{\tau_{00}+\sigma^2}\)

# Cálculo directo en R:
8.614/(8.614+39.148) # calculo ICC 
[1] 0.1803526
# O de manera directa con funcion ICC de reghelper
reghelper::ICC(results_0)
[1] 0.1803518

Modelo 1: Predictores de nivel individual

results_1 = lmer(mathach ~ 1 + ses + female + (1 | schoolid), data = mlm)
screenreg(results_1, naive=TRUE)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    13.28 ***
                               (0.20)   
ses                             2.36 ***
                               (0.11)   
female                         -1.19 ***
                               (0.17)   
----------------------------------------
AIC                         46605.73    
BIC                         46640.13    
Log Likelihood             -23297.86    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       4.49    
Var: Residual                  36.81    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 2: Predictores nivel 2

results_2 = lmer(mathach ~ 1 + sector + mnses + (1 | schoolid), data = mlm)
screenreg(results_2)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.13 ***
                               (0.20)   
sector                          1.23 ***
                               (0.31)   
mnses                           5.33 ***
                               (0.37)   
----------------------------------------
AIC                         46956.50    
BIC                         46990.90    
Log Likelihood             -23473.25    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       2.31    
Var: Residual                  39.16    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 3: Predictores individuales y grupales

results_3 = lmer(mathach ~ 1 + ses + female + sector + mnses + (1 | schoolid), data = mlm)
screenreg(results_3)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.74 ***
                               (0.21)   
ses                             2.15 ***
                               (0.11)   
female                         -1.20 ***
                               (0.16)   
sector                          1.25 ***
                               (0.30)   
mnses                           3.07 ***
                               (0.37)   
----------------------------------------
AIC                         46515.61    
BIC                         46563.77    
Log Likelihood             -23250.80    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       2.15    
Var: Residual                  36.80    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 4: Pendiente aleatoria

results_4 = lmer(mathach ~ 1 + ses + female + mnses + sector + (1 + ses | schoolid), data = mlm)
screenreg(results_4)

============================================
                               Model 1      
--------------------------------------------
(Intercept)                        12.65 ***
                                   (0.21)   
ses                                 2.16 ***
                                   (0.12)   
female                             -1.19 ***
                                   (0.16)   
mnses                               3.07 ***
                                   (0.38)   
sector                              1.43 ***
                                   (0.30)   
--------------------------------------------
AIC                             46514.33    
BIC                             46576.24    
Log Likelihood                 -23248.16    
Num. obs.                        7185       
Num. groups: schoolid             160       
Var: schoolid (Intercept)           2.22    
Var: schoolid ses                   0.42    
Cov: schoolid (Intercept) ses       0.27    
Var: Residual                      36.59    
============================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 5: Interacción entre niveles

$mathach= {00}(intercepto)+{10}ses_{ij}+ \(\gamma_{01}sector_j+\gamma_{11}ses_{ij} *sector{j}+\) \(\mu_{0j}(intercepto)+\mu_{1j}ses_{ij}+ r_{ij}\)

results_5 = lmer(mathach ~ 1 + female + ses*sector + mnses + (1 + ses | schoolid), data = mlm)
screenreg(results_5)

============================================
                               Model 1      
--------------------------------------------
(Intercept)                        12.81 ***
                                   (0.21)   
female                             -1.18 ***
                                   (0.16)   
ses                                 2.73 ***
                                   (0.14)   
sector                              1.29 ***
                                   (0.29)   
mnses                               3.04 ***
                                   (0.37)   
ses:sector                         -1.31 ***
                                   (0.21)   
--------------------------------------------
AIC                             46482.91    
BIC                             46551.71    
Log Likelihood                 -23231.45    
Num. obs.                        7185       
Num. groups: schoolid             160       
Var: schoolid (Intercept)           2.11    
Var: schoolid ses                   0.03    
Cov: schoolid (Intercept) ses       0.17    
Var: Residual                      36.60    
============================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Comparación individual, agregado y multinivel

Generar regresiones para comparar

reg_ind=lm(mathach ~ ses + female + mnses + sector, data=mlm)
agg_mlm=mlm %>% group_by(schoolid) %>% summarise_all(funs(mean))
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:

# Simple named list: list(mean = mean, median = median)

# Auto named with `tibble::lst()`: tibble::lst(mean, median)

# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
reg_agg=lm(mathach ~ ses + female + mnses + sector, data=agg_mlm)

Observar: ¿Qué sucede con los coeficientes y errores estándar cuando se comparan los coeficientes y los errores estándar?

screenreg(list(reg_ind, reg_agg, results_3))

=================================================================
                           Model 1      Model 2     Model 3      
-----------------------------------------------------------------
(Intercept)                  12.80 ***   13.13 ***      12.74 ***
                             (0.13)      (0.35)         (0.21)   
ses                           2.15 ***    5.19 ***       2.15 ***
                             (0.11)      (0.37)         (0.11)   
female                       -1.34 ***   -1.97 ***      -1.20 ***
                             (0.15)      (0.56)         (0.16)   
mnses                         2.90 ***                   3.07 ***
                             (0.22)                     (0.37)   
sector                        1.32 ***    1.25 ***       1.25 ***
                             (0.16)      (0.31)         (0.30)   
-----------------------------------------------------------------
R^2                           0.18        0.67                   
Adj. R^2                      0.18        0.67                   
Num. obs.                  7185         160           7185       
AIC                                                  46515.61    
BIC                                                  46563.77    
Log Likelihood                                      -23250.80    
Num. groups: schoolid                                  160       
Var: schoolid (Intercept)                                2.15    
Var: Residual                                           36.80    
=================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Generación de tabla para publicar en HTML

htmlreg(list(reg_ind, reg_agg, results_3), 
    custom.model.names = c("Individual","Agregado","Multinivel"),    
    custom.coef.names = c("Intercepto", "$SES_{ij}$","$Mujer_{ij}$", "$SES_{j}$", "$Sector_{j}$"), 
    custom.gof.names=c(NA,NA,NA,NA,NA,NA,NA, 
                   "Var:id ($\\tau_{00}$)","Var: Residual ($\\sigma^2$)"),
    custom.note = "%stars. Errores estándar en paréntesis",
    caption="Comparación de modelos Individual, Agregado y Multinivel",
    caption.above=TRUE,
    doctype = FALSE)
Comparación de modelos Individual, Agregado y Multinivel
  Individual Agregado Multinivel
Intercepto 12.80*** 13.13*** 12.74***
  (0.13) (0.35) (0.21)
\(SES_{ij}\) 2.15*** 5.19*** 2.15***
  (0.11) (0.37) (0.11)
\(Mujer_{ij}\) -1.34*** -1.97*** -1.20***
  (0.15) (0.56) (0.16)
\(SES_{j}\) 2.90***   3.07***
  (0.22)   (0.37)
\(Sector_{j}\) 1.32*** 1.25*** 1.25***
  (0.16) (0.31) (0.30)
R2 0.18 0.67  
Adj. R2 0.18 0.67  
Num. obs. 7185 160 7185
AIC     46515.61
BIC     46563.77
Log Likelihood     -23250.80
Num. groups: schoolid     160
Var:id (\(\tau_{00}\))     2.15
Var: Residual (\(\sigma^2\))     36.80
***p < 0.001; **p < 0.01; *p < 0.05. Errores estándar en paréntesis

Foro