The intrinsic conditional auto-regressive (ICAR) model for spatial count data. Options include the BYM model, the BYM2 model, and a solo ICAR term.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192-225.
Besag, J., York, J., & Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(1), 1-20.
Donegan, Connor. 2021. Flexible functions for ICAR, BYM, and BYM2 models in Stan. Code repository. https://github.com/ConnorDonegan/Stan-IAR
Donegan, Connor and Chun, Yongwan and Griffith, Daniel A. (2021). Modeling community health with areal data: Bayesian inference with survey standard errors and spatial structure. Int. J. Env. Res. and Public Health 18 (13): 6856. DOI: 10.3390/ijerph18136856 Data and code: https://github.com/ConnorDonegan/survey-HBM.
Donegan, Connor (2021). Spatial conditional autoregressive models in Stan. OSF Preprints. doi:10.31219/osf.io/3ey65 .
Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. A Note on Intrinsic Conditional Autoregressive Models for Disconnected Graphs. Spatial and Spatio-Temporal Epidemiology, 26: 25–34.
Morris, M., Wheeler-Martin, K., Simpson, D., Mooney, S. J., Gelman, A., & DiMaggio, C. (2019). Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan. Spatial and spatio-temporal epidemiology, 31, 100301.
Riebler, A., Sorbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145-1165.
A model formula, following the R formula syntax. Binomial models can be specified by setting the left hand side of the equation to a data frame of successes and failures, as in cbind(successes, failures) ~ x
.
Formula to specify any spatially-lagged covariates. As in, ~ x1 + x2
(the intercept term will be removed internally). When setting priors for beta
, remember to include priors for any SLX terms.
To include a varying intercept (or "random effects") term, alpha_re
, specify the grouping variable here using formula syntax, as in ~ ID
. Then, alpha_re
is a vector of parameters added to the linear predictor of the model, and:
alpha_re ~ N(0, alpha_tau)
alpha_tau ~ Student_t(d.f., location, scale).
Before using this term, read the Details
section and the type
argument. Specifically, if you use type = bym
, then an observational-level re
term is already included in the model. (Similar for type = bym2
.)
A data.frame
or an object coercible to a data frame by as.data.frame
containing the model data.
Spatial connectivity matrix which will be used to construct an edge list for the ICAR model, and to calculate residual spatial autocorrelation as well as any user specified slx
terms. It will automatically be row-standardized before calculating slx
terms. C
must be a binary symmetric n x n
matrix.
The likelihood function for the outcome variable. Current options are binomial(link = "logit")
and poisson(link = "log")
.
Defaults to "icar" (partial pooling of neighboring observations through parameter phi
); specify "bym" to add a second parameter vector theta
to perform partial pooling across all observations; specify "bym2" for the innovation introduced by Riebler et al. (2016). See Details
for more information.
For the BYM2 model, optional. If missing, this will be set to a vector of ones. See Details
.
A named list of parameters for prior distributions (see priors
):
The intercept is assigned a Gaussian prior distribution (see normal
Regression coefficients are assigned Gaussian prior distributions. Variables must follow their order of appearance in the model formula
. Note that if you also use slx
terms (spatially lagged covariates), and you use custom priors for beta
, then you have to provide priors for the slx terms. Since slx terms are prepended to the design matrix, the prior for the slx term will be listed first.
For family = gaussian()
and family = student_t()
models, the scale parameter, sigma
, is assigned a (half-) Student's t prior distribution. The half-Student's t prior for sigma
is constrained to be positive.
nu
is the degrees of freedom parameter in the Student's t likelihood (only used when family = student_t()
). nu
is assigned a gamma prior distribution. The default prior is prior = list(nu = gamma(alpha = 3, beta = 0.2))
.
The scale parameter for random effects, or varying intercepts, terms. This scale parameter, tau
, is assigned a half-Student's t prior. To set this, use, e.g., prior = list(tau = student_t(df = 20, location = 0, scale = 20))
.
To model observational uncertainty (i.e. measurement or sampling error) in any or all of the covariates, provide a list of data as constructed by the prep_me_data
function.
To center predictors on their mean values, use centerx = TRUE
. If the ME argument is used, the modeled covariate (i.e., latent variable), rather than the raw observations, will be centered. When using the ME argument, this is the recommended method for centering the covariates.
Integer value indicating the maximum censored value; this argument is for modeling censored (suppressed) outcome data, typically disease case counts or deaths. For example, the US Centers for Disease Control and Prevention censors (does not report) death counts that are nine or fewer, so if you're using CDC WONDER mortality data you could provide censor_point = 9
.
Draw samples from the prior distributions of parameters only.
Number of MCMC chains to estimate.
Number of samples per chain. .
Stan will print the progress of the sampler every refresh
number of samples; set refresh=0
to silence this.
If keep_all = TRUE
then samples for all parameters in the Stan model will be kept; this is necessary if you want to do model comparison with Bayes factors and the bridgesampling
package.
Optional; specify any additional parameters you'd like stored from the Stan model.
A named list of parameters to control the sampler's behavior. See stan
for details.
Other arguments passed to sampling. For multi-core processing, you can use cores = parallel::detectCores()
, or run options(mc.cores = parallel::detectCores())
first.
An object of class class geostan_fit
(a list) containing:
Summaries of the main parameters of interest; a data frame
Widely Applicable Information Criteria (WAIC) with a measure of effective number of parameters (eff_pars
) and mean log pointwise predictive density (lpd
), and mean residual spatial autocorrelation as measured by the Moran coefficient.
an object of class stanfit
returned by rstan::stan
a data frame containing the model data
The edge list representing all unique sets of neighbors and the weight attached to each pair (i.e., their corresponding element in the connectivity matrix C
Spatial connectivity matrix
the user-provided or default family
argument used to fit the model
The model formula provided by the user (not including ICAR component)
The slx
formula
A list with two name elements, formula
and Data
, containing the formula re
and a data frame with columns id
(the grouping variable) and idx
(the index values assigned to each group).
Prior specifications.
If covariates are centered internally (centerx = TRUE
), then x_center
is a numeric vector of the values on which covariates were centered.
A data frame with the name of the spatial parameter ("phi"
if type = "icar"
else "convolution"
) and method (toupper(type)
).
The intrinsic conditional autoregressive (ICAR) model for spatial data was introduced by Besag et al. (1991). The Stan code for the ICAR component of the model and the BYM2 option is from Morris et al. (2019) with adjustments to enable non-binary weights and disconnected graph structures (see Freni-Sterrantino (2018) and Donegan (2021)).
The exact specification depends on the type
argument.
For Poisson models for count data, y, the basic model specification (type = "icar"
) is:
$$
y ~ Poisson(e^{O + \mu + \phi}) \\
\phi \sim ICAR(\tau_s) \\
\tau_s \sim Gauss(0, 1)
$$
where \(\mu\) contains an intercept and potentially covariates. The spatial trend \(phi\) has a mean of zero and a single scale parameter \(\tau_s\) (which user's will see printed as the parameter named spatial_scale
).
The ICAR prior model is a CAR model that has a spatial autocorrelation parameter \(\rho\) equal to 1 (see stan_car). Thus the ICAR prior places high probability on a very smooth spatially (or temporally) varying mean. This is rarely sufficient to model the amount of variation present in social and health data.
Often, an observational-level random effect term, theta
, is added to capture (heterogeneous or unstructured) deviations from \(\mu + \phi\). The combined term is referred to as a convolution term:
\(
convolution = \phi + \theta.
\)
This is known as the BYM model (Besag et al. 1991), and can be specified using type = "bym"
:
\(
y \sim Poisson(e^{O + \mu + \phi + \theta}) \\
\phi \sim ICAR(\tau_s) \\
\theta \sim Gaussian(0, \tau_{ns})
\tau_s \sim Gaussian(0, 1)
\tau_{ns} \sim Gaussian(0, 1)
\)
Riebler et al. (2016) introduce a variation on the BYM model (type = "bym2"
). This specification combines \(\phi\) and \(\theta\) using a mixing parameter \(\rho\) that controls the proportion of the variation that is attributable to the spatially autocorrelated term \(\phi\) rather than the spatially unstructured term \(\theta\). The terms share a single scale parameter:
$$
convolution = [sqrt(\rho * scale_factor) * \tilde{\phi} + sqrt(1 - \rho) \tilde{\theta}] * \tau_s \\
\tilde{\phi} \sim Gaussian(0, 1) \\
\tilde{\theta} \sim Gaussian(0, 1) \\
\tau_s \sim Gaussian(0, 1)
$$
The terms \(\tilde{\phi}\), \(\tilde{\theta}\) are standard normal deviates, \(\rho\) is restricted to values between zero and one, and the 'scale_factor' is a constant term provided by the user. By default, the 'scale_factor' is equal to one, so that it does nothing. Riebler et al. (2016) argue that the interpretation or meaning of the scale of the ICAR model depends on the graph structure of the connectivity matrix \(C\). This implies that the same prior distribution assigned to \(\tau_s\) will differ in its implications if \(C\) is changed; in other words, the priors are not transportable across models, and models that use the same nominal prior actually have different priors assigned to \(\tau_s\).
Borrowing R
code from Morris (2017) and following Freni-Sterrantino et al. (2018), the following R
code can be used to create the 'scale_factor' for the BYM2 model (note, this requires the INLA R package), given a spatial adjacency matrix, \(C\):
## create a list of data for stan_icar
icar.data <- geostan::prep_icar_data(C)
## calculate scale_factor for each of k connected group of nodes
k <- icar.data$k
scale_factor <- vector(mode = "numeric", length = k)
for (j in 1:k) {
g.idx <- which(icar.data$comp_id == j)
if (length(g.idx) == 1) {
scale_factor[j] <- 1
next
}
Cg <- C[g.idx, g.idx]
scale_factor[j] <- scale_c(Cg)
}
This code adjusts for 'islands' or areas with zero neighbors, and it also handles disconnected graph structures (see Donegan 2021). Following Freni-Sterrantino (2018), disconnected components of the graph structure are given their own intercept term; however, this value is added to \(\phi\) automatically inside the Stan model. Therefore, the user never needs to make any adjustments for this term. (If you want to avoid complications from a disconnected graph structure, see stan_car
).
Note, the code above requires the scale_c
function; it has package dependencies that are not included in geostan
. To use scale_c
, you have to load the following R
function:
#' compute scaling factor for adjacency matrix, accounting for differences in spatial connectivity
#'
#' @param C connectivity matrix
#'
#' @details
#'
#' Requires the following packages:
#'
#' library(Matrix)
#' library(INLA);
#' library(spdep)
#' library(igraph)
#'
#' @source
#'
#' Morris, Mitzi (2017). Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data. <https://mc-stan.org/users/documentation/case-studies/icar_stan.html>
#'
scale_c <- function(C) {
geometric_mean <- function(x) exp(mean(log(x)))
N = dim(C)[1]
Q = Diagonal(N, rowSums(C)) - C
Q_pert = Q + Diagonal(N) * max(diag(Q)) * sqrt(.Machine$double.eps)
Q_inv = inla.qinv(Q_pert, constr=list(A = matrix(1,1,N),e=0))
scaling_factor <- geometric_mean(Matrix::diag(Q_inv))
return(scaling_factor)
}
The slx
argument is a convenience function for including SLX terms. For example,
$$
y = W X \gamma + X \beta + \epsilon
$$
where \(W\) is a row-standardized spatial weights matrix (see shape2mat), \(WX\) is the mean neighboring value of \(X\), and \(\gamma\) is a coefficient vector. This specifies a regression with spatially lagged covariates. SLX terms can specified by providing a formula to the slx
argument:
stan_glm(y ~ x1 + x2, slx = ~ x1 + x2, \...),
which is a shortcut for
stan_glm(y ~ I(W \%*\% x1) + I(W \%*\% x2) + x1 + x2, \...)
SLX terms will always be prepended to the design matrix, as above, which is important to know when setting prior distributions for regression coefficients.
For measurement error (ME) models, the SLX argument is the only way to include spatially lagged covariates since the SLX term needs to be re-calculated on each iteration of the MCMC algorithm.
The ME models are designed for surveys with spatial sampling designs, such as the American Community Survey (ACS) estimates. Given estimates \(x\), their standard errors \(s\), and the target quantity of interest (i.e., the unknown true value) \(z\), the ME models have one of the the following two specifications, depending on the user input. If a spatial CAR model is specified, then: $$ x \sim Gauss(z, s^2) \\ z \sim Gauss(\mu_z, \Sigma_z) \\ \Sigma_z = (I - \rho C)^{-1} M \\ \mu_z \sim Gauss(0, 100) \\ \tau_z \sim Student(10, 0, 40), \tau > 0 \\ \rho_z \sim uniform(l, u) $$ where \(\Sigma\) specifies a spatial conditional autoregressive model with scale parameter \(\tau\) (on the diagonal of \(M\)), and \(l\), \(u\) are the lower and upper bounds that \(\rho\) is permitted to take (which is determined by the extreme eigenvalues of the spatial connectivity matrix \(C\)).
For non-spatial ME models, the following is used instead: $$ x \sim Gauss(z, s^2) \\ z \sim student(\nu_z, \mu_z, \sigma_z) \\ \nu_z \sim gamma(3, 0.2) \\ \mu_z \sim Gauss(0, 100) \\ \sigma_z \sim student(10, 0, 40). $$
For strongly skewed variables, such as census tract poverty rates, it can be advantageous to apply a logit transformation to \(z\) before applying the CAR or Student-t prior model. When the logit
argument is used, the model becomes:
$$
x \sim Gauss(z, s^2) \\
logit(z) \sim Gauss(\mu_z, \Sigma_z)
...
$$
and similarly for the Student t model:
$$
x \sim Gauss(z, s^2) \\
logit(z) \sim student(\nu_z, \mu_z, \sigma_z) \\
...
$$
Vital statistics systems and disease surveillance programs typically suppress case counts when they are smaller than a specific threshold value. In such cases, the observation of a censored count is not the same as a missing value; instead, you are informed that the value is an integer somewhere between zero and the threshold value. For Poisson models (family = poisson())
), you can use the censor_point
argument to encode this information into your model.
Internally, geostan
will keep the index values of each censored observation, and the index value of each of the fully observed outcome values. For all observed counts, the likelihood statement will be:
$$
p(y_i | data, model) = poisson(y_i | \mu_i),
$$
as usual, where \(\mu_i\) may include whatever spatial terms are present in the model.
For each censored count, the likelihood statement will equal the cumulative Poisson distribution function for values zero through the censor point: $$ p(y_i | data, model) = \sum_{m=0}^{M} Poisson( m | \mu_i), $$ where \(M\) is the censor point and \(\mu_i\) again is the fitted value for the \(i^{th}\) observation.
For example, the US Centers for Disease Control and Prevention's CDC WONDER database censors all death counts between 0 and 9. To model CDC WONDER mortality data, you could provide censor_point = 9
and then the likelihood statement for censored counts would equal the summation of the Poisson probability mass function over each integer ranging from zero through 9 (inclusive), conditional on the fitted values (i.e., all model parameters). See Donegan (2021) for additional discussion, references, and Stan code.
# \donttest{
# for parallel processing of models:
#options(mc.cores = parallel::detectCores())
data(sentencing)
C <- shape2mat(sentencing, "B")
log_e <- log(sentencing$expected_sents)
fit.bym <- stan_icar(sents ~ offset(log_e),
family = poisson(),
data = sentencing,
type = "bym",
C = C,
chains = 2, iter = 800) # for speed only
# spatial diagnostics
sp_diag(fit.bym, sentencing)
# check effective sample size and convergence
library(rstan)
rstan::stan_ess(fit.bym$stanfit)
rstan::stan_rhat(fit.bym$stanfit)
# calculate log-standardized incidence ratios
# (observed/exected case counts)
library(ggplot2)
library(sf)
f <- fitted(fit.bym, rates = FALSE)$mean
SSR <- f / sentencing$expected_sents
log.SSR <- log( SSR, base = 2)
ggplot( st_as_sf(sentencing) ) +
geom_sf(aes(fill = log.SSR)) +
scale_fill_gradient2(
low = "navy",
high = "darkred"
) +
labs(title = "Log-standardized sentencing ratios",
subtitle = "log( Fitted/Expected), base 2") +
theme_void() +
theme(
legend.position = "bottom",
legend.key.height = unit(0.35, "cm"),
legend.key.width = unit(1.5, "cm")
)
# }