This vignette was written based on the following resources:
The Bookdown offers a theoretical background, explicates the \(Stan\) implementation, and contains several code examples for time series, geostatistical and areal data applications.
1 What and Why
1.1 Why go beyond Gaussian processes?
- Relax the assumption of Gaussianity: Gaussian processes can over-smooth in the presence of local spikes (Walder and Hanks (2020)) and sudden jumps in the data (Dhull and Kumar (2021))
- Predictions: One can obtain more accurate predictions with smaller standard deviations if non-Gaussian features are present in the data,
- Robustness: Accommodate possible outliers in the data, and reduce their impact on the inferences (West (1984))
1.2 What type of non-Gaussian models are you considering?
If \(\mathbf{x}^G\) follows a multivariate Gaussian with mean \(\mathbf{0}\) and precision matrix \(\mathbf{Q}= \mathbf{\mathbf{D}}^T\mathbf{D}\), it can be expressed through
\[ \mathbf{D}\mathbf{x}^G\overset{d}{=} \mathbf{Z}, \] where \(\mathbf{Z}\) is a vector of i.i.d. standard Gaussian variables. The non-Gaussian extension for \(\mathbf{x}^G\) consists in replacing the driving noise distribution: \[ \mathbf{D}\mathbf{x}\overset{d}{=} \mathbf{\Lambda}(\eta^\star,\zeta^\star), \] where \(\boldsymbol{\Lambda}\) is a vector of independent and standardized normal-inverse Gaussian (NIG) random variables that depend on the parameter \(\eta^\star\) and \(\zeta^\star\), which controls the leptokurtosis and skewness of the driving noise.
1.3 Setup
The Gaussian model can be declared in Stan as:
x ~ multi_normal_prec(rep_vector(0,N), D'*D)
The non-Gaussian model declaration is:
x ~ nig_model(D, etas, zetas, h, 1)
The argument h
should be a vector of ones for
discrete-space models. For models defined in continuous space, for
instance, a continuous random walk of order 1, h
contains
the distance between locations. The last argument is an integer with
value 1 if the log-determinant of \(\mathbf{D}\) should be computed (if \(\mathbf{D}\) depends on parameters), or 0
otherwise.
The nig_model
and other Stan functions can be found in
nigstan\files\functions.stan
on github.com/rafaelcabral96/nigstan.
The nig_model
uses a collapsed representation of \(\mathbf{x}\), and also has built-in
within-chain parallelizations to improve speed. Other available
functions are:
nig_model_2
: Similar tonig_model
but leverages on the sparsity of \(\mathbf{D}\) using Stan’s built-in functions for sparse matrix operationsnig_multi
: independent NIG model (useful for a non-centered parameterization)
1.4 Why not just transform Gaussian processes?
For non-Gaussian datasets, a common approach is to start with a Gaussian process and then apply a non-linear transformation, such as the logarithm or the square root. However, Wallin and Bolin (2015) showed that the mean and covariance function of the transformed process could change in non-intuitive ways. For the square root transformation, if the mean of the Gaussian process contains covariates, this will induce a non-stationary covariance function for the data.
The non-Gaussian processes driven by NIG noise that we are considering allow modeling the 3 model components separately without confounding: the mean, the covariance structure, and the non-Gaussianity. Changing one component does not affect the others.
1.5 What models fit into this framework?
We list some models that can be extended to non-Gaussianity in this way:
- i.i.d. random effects;
- Random walk (RW) and autoregressive processes (Ghasami, Khodadadi, and Maleki (2020)) for time series
- Simultaneous autoregressive (Walder and Hanks (2020)) and conditional autoregressive processes for graphical models and areal data
- Matérn processes (Wallin and Bolin (2015)), which can be used in a variety of applications, such as in geostatistics and spatial point processes
- More generally, several stochastic processes driven by Gaussian white noise defined via stochastic partial differential equations (SPDEs) \(\mathcal{D}X(t) = \mathcal{W}(t)\) can be extended to non-Gaussianity by replacing \(\mathcal{W}(t)\) by a NIG white noise distribution. The finite element method can be used to approximate \(X(t)\) to discrete space, leading to the system \(\mathbf{D}\mathbf{x} = \mathbf{\Lambda}\).
1.6 What sample paths do these models produce?
The first row of the following figure shows Gaussian (left) and NIG (right) white noise processes. The rows below show several processes built from these driving noises, including RW1, RW2, Ornstein–Uhlenbeck (OU), and Matérn (\(\alpha=2\)) processes. Whenever the NIG noise takes an extreme value (for instance, near location 0.8), the CRW1 and OU processes will exhibit a distinct jump, and the RW2 and Matérn processes will exhibit a kink (discontinuity in the first derivative).
2 Example (Columbus dataset)
The dataset consists of crime rates in thousands (\(y_i\)) in 49 counties of Columbus, Ohio,
and can be found in the spdep
R package. We will fit the
following model: \[
\mathbf{y}_{i}= \beta_0 + \beta_1 \text{HV}_i + \beta_2 \text{HI}_i
+ \sigma_{\mathbf{x}}\mathbf{x}_i,
\] where \(\text{HV}_i\) and
\(\text{HI}_i\) are the average
household value and household income for county \(i\), \(\mathbf{x}\) is a spatial effects SAR
model.
data(columbus)
<- columbus[,c("CRIME","HOVAL","INC")] # data
data <- nrow(data) # number of counties
N <- readOGR(system.file("shapes/columbus.shp", package="spData")[1]) # shape file containing the polygons map
## OGR data source with driver: ESRI Shapefile
## Source: "C:\R\library\spData\shapes\columbus.shp", layer: "columbus"
## with 49 features
## It has 20 fields
## Integer64 fields read as strings: COLUMBUS_ COLUMBUS_I POLYID
In the next Leaflet widget, we show the crime rates and the two covariates.
We consider a simultaneous autoregressive (SAR) model for the spatially structured effects \(\mathbf{x}\). The Gaussian version of this model can be built from the following system \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{Z}\), where \(\mathbf{D}_{SAR}=\mathbf{I}-\rho\mathbf{W}\). The matrix \(\mathbf{W}\) is the row standardized adjacency matrix and \(-1<\rho<1\). The equivalent model driven by NIG noise is then \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{\Lambda}\), where \(\mathbf{\Lambda}\) is i.i.d. standardized NIG noise.
We obtain next the adjacency matrix \(\mathbf{W}\).
<- poly2nb(map) # Construct neighbours list from polygon list
nb_q <- nb2listw(nb_q, style="B", zero.policy=TRUE) # Spatial weights for neighbours lists
nb_W <- as.matrix(as(nb_W, "sparseMatrix")) # Adjacency matrix W
W <- diag(1/rowSums(W))%*%W # Row standardize adjacency matrix W
We build the design matrix \(\mathbf{B}\) and the list to be passed to Stan next:
<- cbind(rep(1,N),data[,c(2,3)]) #Design matrix
B
<- list(N = N, #Number of observations
dat1 N_covariates = 3, #Number of covariates
y = data$CRIME, #Observations
B = B, #Design matrix
W = W, #Row standardized adjacency matrix
thetaetas = 1, #Rate parameter of the exponential prior for etas
thetazetas = 2) #Rate parameter of the exponential prior for zetas
2.1 Gaussian fit
An (inefficient) implementation of the Gaussian SAR model is:
transformed parameters{
vector[N] x = (y - B*beta)/sigma; // Spatial effects
}
model{
matrix[N,N] D = add_diag(-rho*W, 1); // D = I - rho W;
x ~ multi_normal_prec(rep_vector(0,N), D'*D);
...
}
<- cmdstan_model('files/GaussSAR.stan')
model_stan_Gauss <- model_stan_Gauss$sample(data = dat1,
fit_Gauss chains = 4,
iter_warmup = 2000,
iter_sampling = 3000)
$save_object("files/Gauss_fit.rds")
fit_Gauss$cmdstan_diagnose() #No warnings fit_Gauss
Let us look at the summary:
<- readRDS("files/Gauss_fit.rds")
fit_Gauss ::kable(head(fit_Gauss$summary(),6), "simple", row.names = NA, digits=2) knitr
variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
---|---|---|---|---|---|---|---|---|---|
lp__ | -16.37 | -16.01 | 1.74 | 1.55 | -19.72 | -14.26 | 1 | 4425.99 | 6070.21 |
sigma | 20.46 | 20.31 | 2.70 | 2.69 | 16.32 | 25.12 | 1 | 7762.05 | 7477.02 |
rho | 0.36 | 0.36 | 0.19 | 0.21 | 0.06 | 0.68 | 1 | 5743.25 | 4273.28 |
beta[1] | 63.10 | 63.39 | 10.73 | 10.23 | 45.01 | 80.31 | 1 | 6695.01 | 6506.59 |
beta[2] | -0.30 | -0.30 | 0.19 | 0.19 | -0.61 | 0.02 | 1 | 7333.82 | 6778.85 |
beta[3] | -1.17 | -1.18 | 0.70 | 0.69 | -2.34 | -0.02 | 1 | 5403.83 | 6703.51 |
2.2 NIG fit
The SAR model driven with NIG noise can be declared in Stan by just changing one line of code of the Gaussian Stan model.
transformed parameters{
vector[N] X = (y - B*beta)/sigma; // Spatial effects
}
model{
matrix[N,N] D = add_diag(-rho*W, 1); // D = I - rho W;
X ~ nig_model(D, etas, zetas, h, 1);
.,.
<- cmdstan_model('files/NIGSAR.stan')
model_stan_NIG <- model_stan_NIG$sample(data = dat1,
fit_NIG chains = 4,
iter_warmup = 2000,
iter_sampling = 3000)
$save_object("files/NIG_columbus.rds")
fit_NIG$cmdstan_diagnose() #No warnings fit_NIG
The posterior distributions for the regression coefficients were similar, and the posterior distributions of \(\eta^\star\) and \(\zeta^\star\) suggest heavy-tailedness although no asymmetry.
<- readRDS("files/NIG_columbus.rds")
fit_NIG ::kable(head(fit_NIG$summary(),8), "simple", row.names = NA, digits=2) knitr
variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
---|---|---|---|---|---|---|---|---|---|
lp__ | -50.82 | -50.60 | 2.42 | 2.37 | -55.19 | -47.31 | 1 | 2268.76 | 4986.52 |
sigma | 27.41 | 27.32 | 3.34 | 3.34 | 22.10 | 33.10 | 1 | 4578.12 | 5636.57 |
rho | 0.51 | 0.52 | 0.19 | 0.20 | 0.15 | 0.80 | 1 | 3505.46 | 3361.02 |
etas | 6.51 | 6.26 | 2.45 | 2.38 | 2.95 | 10.94 | 1 | 5081.12 | 5061.76 |
zetas | -0.11 | -0.01 | 0.76 | 0.23 | -1.53 | 1.27 | 1 | 1895.06 | 2635.10 |
beta[1] | 59.30 | 60.42 | 16.32 | 12.13 | 33.02 | 81.45 | 1 | 2091.19 | 2467.55 |
beta[2] | -0.15 | -0.14 | 0.14 | 0.13 | -0.39 | 0.07 | 1 | 3542.21 | 3231.01 |
beta[3] | -1.45 | -1.45 | 0.57 | 0.57 | -2.38 | -0.50 | 1 | 2892.18 | 4458.28 |
The NIG noise \(\Lambda_i\) can be
seen as a normal variance mixture. In the symmetric case, \(\Lambda_i|V_i \sim N(0,V_i)\), where \(V_i\) are inverse-Gaussian mixing
variables. The posterior draws for \(\mathbf{V} = [V_1,\dotsc,V_N]^T\) can be
generated based from the posterior draws of \(\mathbf{x}\), \(\eta^\star\), \(\zeta^\star\) and \(\rho\) as done next. We utilized the
function VposteriorSAR
located at
files/utils.R
. High values for \(V_i\) indicate that region \(i\) had a spike in the spatial effects that
was not captured by the covariates, which are usually too large to be
modeled by a Gaussian model.
<- as.matrix(as_draws_df(fit_NIG$draws("x")))[,1:N]
X <- as.matrix(as_draws_df(fit_NIG$draws("etas")))[,1]
etas <- as.matrix(as_draws_df(fit_NIG$draws("zetas")))[,1]
zetas <- as.matrix(as_draws_df(fit_NIG$draws("rho")))[,1]
rho <- rep(1,N)
h
<- VposteriorSAR(X, rho, W, etas, zetas, h) V_post
In the next leaflet widget, we show the posterior mean and standard deviation of the spatial effects as well as the posterior mean of \(\mathbf{V}\).