Abstract

This vignette introduces a generic class of non-Gaussian models and a small code example illustrating how to use the suite of functions we developed to implement these models in Stan. The models have the same mean and covariance structure as Gaussian models but have more flexible sample path behaviors and skewed marginals with longer tails.

This vignette was written based on the following resources:

The Bookdown offers a theoretical background, explicates the \(Stan\) implementation, and contains several code examples for time series, geostatistical and areal data applications.

1 What and Why

1.1 Why go beyond Gaussian processes?

Relax the assumption of Gaussianity: Gaussian processes can over-smooth in the presence of local spikes (Walder and Hanks (2020)) and sudden jumps in the data (Dhull and Kumar (2021))
Predictions: One can obtain more accurate predictions with smaller standard deviations if non-Gaussian features are present in the data,
Robustness: Accommodate possible outliers in the data, and reduce their impact on the inferences (West (1984))

1.2 What type of non-Gaussian models are you considering?

If \(\mathbf{x}^G\) follows a multivariate Gaussian with mean \(\mathbf{0}\) and precision matrix \(\mathbf{Q}= \mathbf{\mathbf{D}}^T\mathbf{D}\), it can be expressed through

\[ \mathbf{D}\mathbf{x}^G\overset{d}{=} \mathbf{Z}, \] where \(\mathbf{Z}\) is a vector of i.i.d. standard Gaussian variables. The non-Gaussian extension for \(\mathbf{x}^G\) consists in replacing the driving noise distribution: \[ \mathbf{D}\mathbf{x}\overset{d}{=} \mathbf{\Lambda}(\eta^\star,\zeta^\star), \] where \(\boldsymbol{\Lambda}\) is a vector of independent and standardized normal-inverse Gaussian (NIG) random variables that depend on the parameter \(\eta^\star\) and \(\zeta^\star\), which controls the leptokurtosis and skewness of the driving noise.

1.3 Setup

The Gaussian model can be declared in Stan as:

x ~ multi_normal_prec(rep_vector(0,N), D'*D)

The non-Gaussian model declaration is:

x ~ nig_model(D, etas, zetas, h, 1)

The argument h should be a vector of ones for discrete-space models. For models defined in continuous space, for instance, a continuous random walk of order 1, h contains the distance between locations. The last argument is an integer with value 1 if the log-determinant of \(\mathbf{D}\) should be computed (if \(\mathbf{D}\) depends on parameters), or 0 otherwise.

The nig_model and other Stan functions can be found in nigstan\files\functions.stan on github.com/rafaelcabral96/nigstan. The nig_model uses a collapsed representation of \(\mathbf{x}\), and also has built-in within-chain parallelizations to improve speed. Other available functions are:

nig_model_2: Similar to nig_model but leverages on the sparsity of \(\mathbf{D}\) using Stan’s built-in functions for sparse matrix operations
nig_multi: independent NIG model (useful for a non-centered parameterization)

1.4 Why not just transform Gaussian processes?

For non-Gaussian datasets, a common approach is to start with a Gaussian process and then apply a non-linear transformation, such as the logarithm or the square root. However, Wallin and Bolin (2015) showed that the mean and covariance function of the transformed process could change in non-intuitive ways. For the square root transformation, if the mean of the Gaussian process contains covariates, this will induce a non-stationary covariance function for the data.

The non-Gaussian processes driven by NIG noise that we are considering allow modeling the 3 model components separately without confounding: the mean, the covariance structure, and the non-Gaussianity. Changing one component does not affect the others.

1.5 What models fit into this framework?

We list some models that can be extended to non-Gaussianity in this way:

i.i.d. random effects;
Random walk (RW) and autoregressive processes (Ghasami, Khodadadi, and Maleki (2020)) for time series
Simultaneous autoregressive (Walder and Hanks (2020)) and conditional autoregressive processes for graphical models and areal data
Matérn processes (Wallin and Bolin (2015)), which can be used in a variety of applications, such as in geostatistics and spatial point processes
More generally, several stochastic processes driven by Gaussian white noise defined via stochastic partial differential equations (SPDEs) \(\mathcal{D}X(t) = \mathcal{W}(t)\) can be extended to non-Gaussianity by replacing \(\mathcal{W}(t)\) by a NIG white noise distribution. The finite element method can be used to approximate \(X(t)\) to discrete space, leading to the system \(\mathbf{D}\mathbf{x} = \mathbf{\Lambda}\).

1.6 What sample paths do these models produce?

The first row of the following figure shows Gaussian (left) and NIG (right) white noise processes. The rows below show several processes built from these driving noises, including RW1, RW2, Ornstein–Uhlenbeck (OU), and Matérn (\(\alpha=2\)) processes. Whenever the NIG noise takes an extreme value (for instance, near location 0.8), the CRW1 and OU processes will exhibit a distinct jump, and the RW2 and Matérn processes will exhibit a kink (discontinuity in the first derivative).

Noise and sample paths of several models for \(\eta=10^{-6}\) (left) and \(\eta=1\) (right), for \(\sigma=1\) and \(\zeta=0\).

Sample paths of a Matérn model in 2D (smoothness parameter \(\alpha=2\)), driven by NIG noise for several values of \(\eta^\star\).

2 Example (Columbus dataset)

The dataset consists of crime rates in thousands (\(y_i\)) in 49 counties of Columbus, Ohio, and can be found in the spdep R package. We will fit the following model: \[ \mathbf{y}_{i}= \beta_0 + \beta_1 \text{HV}_i + \beta_2 \text{HI}_i + \sigma_{\mathbf{x}}\mathbf{x}_i, \] where \(\text{HV}_i\) and \(\text{HI}_i\) are the average household value and household income for county \(i\), \(\mathbf{x}\) is a spatial effects SAR model.

data(columbus)
data <- columbus[,c("CRIME","HOVAL","INC")]                              # data
N    <- nrow(data)                                                       # number of counties
map  <- readOGR(system.file("shapes/columbus.shp", package="spData")[1]) # shape file containing the polygons

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\R\library\spData\shapes\columbus.shp", layer: "columbus"
## with 49 features
## It has 20 fields
## Integer64 fields read as strings:  COLUMBUS_ COLUMBUS_I POLYID

In the next Leaflet widget, we show the crime rates and the two covariates.

We consider a simultaneous autoregressive (SAR) model for the spatially structured effects \(\mathbf{x}\). The Gaussian version of this model can be built from the following system \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{Z}\), where \(\mathbf{D}_{SAR}=\mathbf{I}-\rho\mathbf{W}\). The matrix \(\mathbf{W}\) is the row standardized adjacency matrix and \(-1<\rho<1\). The equivalent model driven by NIG noise is then \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{\Lambda}\), where \(\mathbf{\Lambda}\) is i.i.d. standardized NIG noise.

We obtain next the adjacency matrix \(\mathbf{W}\).

nb_q <- poly2nb(map)                                   # Construct neighbours list from polygon list
nb_W <- nb2listw(nb_q, style="B", zero.policy=TRUE)    # Spatial weights for neighbours lists
W    <- as.matrix(as(nb_W, "sparseMatrix"))            # Adjacency matrix W
W    <- diag(1/rowSums(W))%*%W                         # Row standardize adjacency matrix

We build the design matrix \(\mathbf{B}\) and the list to be passed to Stan next:

B <- cbind(rep(1,N),data[,c(2,3)])                      #Design matrix

dat1 <- list(N              = N,                        #Number of observations
             N_covariates   = 3,                        #Number of covariates
             y              = data$CRIME,               #Observations
             B              = B,                        #Design matrix
             W              = W,                        #Row standardized adjacency matrix
             thetaetas      = 1,                        #Rate parameter of the exponential prior for etas
             thetazetas     = 2)                        #Rate parameter of the exponential prior for zetas

2.1 Gaussian fit

An (inefficient) implementation of the Gaussian SAR model is:

transformed parameters{
  vector[N] x = (y - B*beta)/sigma;                   // Spatial effects
}

model{
  matrix[N,N] D = add_diag(-rho*W, 1);                // D = I - rho W;
  x ~ multi_normal_prec(rep_vector(0,N), D'*D);       
  ...
}

model_stan_Gauss <- cmdstan_model('files/GaussSAR.stan')
fit_Gauss <- model_stan_Gauss$sample(data = dat1, 
                                     chains = 4, 
                                     iter_warmup   = 2000, 
                                     iter_sampling = 3000)

fit_Gauss$save_object("files/Gauss_fit.rds")
fit_Gauss$cmdstan_diagnose() #No warnings

Let us look at the summary:

fit_Gauss <- readRDS("files/Gauss_fit.rds")
knitr::kable(head(fit_Gauss$summary(),6), "simple", row.names = NA, digits=2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lp__	-16.37	-16.01	1.74	1.55	-19.72	-14.26	1	4425.99	6070.21
sigma	20.46	20.31	2.70	2.69	16.32	25.12	1	7762.05	7477.02
rho	0.36	0.36	0.19	0.21	0.06	0.68	1	5743.25	4273.28
beta[1]	63.10	63.39	10.73	10.23	45.01	80.31	1	6695.01	6506.59
beta[2]	-0.30	-0.30	0.19	0.19	-0.61	0.02	1	7333.82	6778.85
beta[3]	-1.17	-1.18	0.70	0.69	-2.34	-0.02	1	5403.83	6703.51

2.2 NIG fit

The SAR model driven with NIG noise can be declared in Stan by just changing one line of code of the Gaussian Stan model.

transformed parameters{
  vector[N] X = (y - B*beta)/sigma;                       // Spatial effects
}

model{

  matrix[N,N] D = add_diag(-rho*W, 1);                    // D = I - rho W;
  X ~ nig_model(D, etas, zetas, h, 1);
.,.

model_stan_NIG <- cmdstan_model('files/NIGSAR.stan')
fit_NIG <- model_stan_NIG$sample(data = dat1, 
                                 chains = 4, 
                                 iter_warmup = 2000, 
                                 iter_sampling = 3000)

fit_NIG$save_object("files/NIG_columbus.rds")
fit_NIG$cmdstan_diagnose() #No warnings

The posterior distributions for the regression coefficients were similar, and the posterior distributions of \(\eta^\star\) and \(\zeta^\star\) suggest heavy-tailedness although no asymmetry.

fit_NIG <- readRDS("files/NIG_columbus.rds")
knitr::kable(head(fit_NIG$summary(),8), "simple", row.names = NA, digits=2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lp__	-50.82	-50.60	2.42	2.37	-55.19	-47.31	1	2268.76	4986.52
sigma	27.41	27.32	3.34	3.34	22.10	33.10	1	4578.12	5636.57
rho	0.51	0.52	0.19	0.20	0.15	0.80	1	3505.46	3361.02
etas	6.51	6.26	2.45	2.38	2.95	10.94	1	5081.12	5061.76
zetas	-0.11	-0.01	0.76	0.23	-1.53	1.27	1	1895.06	2635.10
beta[1]	59.30	60.42	16.32	12.13	33.02	81.45	1	2091.19	2467.55
beta[2]	-0.15	-0.14	0.14	0.13	-0.39	0.07	1	3542.21	3231.01
beta[3]	-1.45	-1.45	0.57	0.57	-2.38	-0.50	1	2892.18	4458.28

The NIG noise \(\Lambda_i\) can be seen as a normal variance mixture. In the symmetric case, \(\Lambda_i|V_i \sim N(0,V_i)\), where \(V_i\) are inverse-Gaussian mixing variables. The posterior draws for \(\mathbf{V} = [V_1,\dotsc,V_N]^T\) can be generated based from the posterior draws of \(\mathbf{x}\), \(\eta^\star\), \(\zeta^\star\) and \(\rho\) as done next. We utilized the function VposteriorSAR located at files/utils.R. High values for \(V_i\) indicate that region \(i\) had a spike in the spatial effects that was not captured by the covariates, which are usually too large to be modeled by a Gaussian model.

X     <- as.matrix(as_draws_df(fit_NIG$draws("x")))[,1:N]
etas  <- as.matrix(as_draws_df(fit_NIG$draws("etas")))[,1]
zetas <- as.matrix(as_draws_df(fit_NIG$draws("zetas")))[,1]
rho   <- as.matrix(as_draws_df(fit_NIG$draws("rho")))[,1]
h     <- rep(1,N)

V_post <- VposteriorSAR(X, rho, W, etas, zetas, h)

In the next leaflet widget, we show the posterior mean and standard deviation of the spatial effects as well as the posterior mean of \(\mathbf{V}\).

References

Dhull, Monika S., and Arun Kumar. 2021. “Normal inverse Gaussian autoregressive model using EM algorithm.” International Journal of Advances in Engineering Sciences and Applied Mathematics.

Ghasami, Safdar, Zahra Khodadadi, and Mohsen Maleki. 2020. “Autoregressive Processes with Generalized Hyperbolic Innovations.” Communications in Statistics-Simulation and Computation 49 (12): 3080–92.

Walder, Adam, and Ephraim M Hanks. 2020. “Bayesian Analysis of Spatial Generalized Linear Mixed Models with Laplace Moving Average Random Fields.” Computational Statistics & Data Analysis 144: 106861.

Wallin, Jonas, and David Bolin. 2015. “Geostatistical Modelling Using Non-Gaussian Matérn Fields.” Scandinavian Journal of Statistics 42 (3): 872–90.

West, Mike. 1984. “Outlier Models and Prior Distributions in Bayesian Linear Regression.” Journal of the Royal Statistical Society: Series B (Methodological) 46 (3): 431–39.

Fitting robust non-Gaussian models in Stan

Rafael Cabral

2022-10-24