Technical Overview • spStack

Introduction

Geostatistics refers to the study of a spatially distributed variable of interest, which in theory is defined at every point over a bounded study region of interest. Statistical modelling and analysis for spatially oriented point-referenced outcomes play a crucial role in diverse scientific applications such as earth and environmental sciences, ecology, epidemiology, and economics. With the advent of Markov chain Monte Carlo (MCMC) algorithms, Bayesian hierarchical models have gained massive popularity in analyzing such point-referenced or, geostatistical data. These models involve latent spatial processes characterized by spatial process parameters, which besides lacking substantive relevance in scientific contexts, are also weakly identified and hence, impedes convergence of MCMC algorithms. Thus, even for moderately large datasets (~ $10^3$ or higher), the computation for MCMC becomes too onerous for practical use.

We introduce the R package spStack that implements Bayesian inference for a class of geostatistical models, where we obviate the issues mentioned by sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Besides delivering competitive predictive performance as compared to fully Bayesian inference using MCMC, our proposed algorithm is embarrassingly parallel, thus drastically improves runtime and elevating the utility of the package for a diverse group of practitioners with limited computational resources at their disposal. This package, to the best of our knowledge, is the first to implement stacking for Bayesian analysis of spatial data.

Technical details surrounding the methodology can be found in the articles Zhang, Tang, and Banerjee (2025) which discuss the case where the distribution of the point-referenced outcomes are Gaussian, and, in Pan et al. (2025) where the case of non-Gaussian outcomes is explored. The code for this package is written primarily in C/C++ with additional calls to FORTRAN routines for optimized linear algebra operations. We leverage the F77_NAME macro to interface with legacy FORTRAN functions in conjunction with efficient matrix computation libraries such as BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) to implement our stacking algorithm.

The remainder of the vignette evolves as follows - the next two sections discuss Bayesian hierarchical spatial models for Gaussian and non-Gaussian outcomes, which is followed by a section providing brief details on predictive stacking and a section dedicated for illustration of functions in the package.

Bayesian Gaussian spatial regression models

Let

\chi = \{s_1, \ldots, s_n\} \in \mathcal{D}

be a be a set of

n

spatial locations yielding measurements

y = (y_1, \ldots, y_n)^{ \scriptstyle \top }

with known values of predictors at these locations collected in the

n \times p

full rank matrix

X = [x(s_1), \ldots, x(s_n)]^{ \scriptstyle \top }

. A customary geostatistical model is

\begin{equation} y_i = x(s_i)^{ \scriptstyle \top }\beta + z(s_i) + \epsilon_i, \quad i = 1, \ldots, n, \end{equation}

where

\beta

is the

p \times 1

vector of slopes,

z(s) \sim \mathsf{GP}(0, R(\cdot, \cdot; {\theta_{\text{sp}}}))

is a zero-centered spatial Gaussian process on

\mathcal{D}

with spatial correlation function

R(\cdot, \cdot; {\theta_{\text{sp}}})

characterized by process parameters

{\theta_{\text{sp}}}

,

\sigma^2

is the spatial variance parameter (“partial sill”) and

\epsilon_i \sim \mathsf{N}(0, \tau^2), i = 1, \ldots, n

are i.i.d. with variance

\tau^2

(“nugget”) capturing measurement error. The spatial process

z(\cdot)

is assumed to be independent of the measurement errors

\{\epsilon_i, i = 1, \ldots, n\}

. Let

z = (z(s_1), \ldots, z(s_n))^{ \scriptstyle \top }

denotes the realization of the spatial process on

\chi

and

n \times n

correlation matrix

R(\chi; {\theta_{\text{sp}}}) = (R(s_i, s_j {\theta_{\text{sp}}}))_{1 \leq i,j \leq n}

. We build a conjugate Bayesian hierarchical spatial model,

\begin{aligned} \begin{split} y \mid z, \beta, \sigma^2 &\sim \mathsf{N}(X\beta + z, \delta^2 \sigma^2 I_n), \\ z \mid \sigma^2 &\sim \mathsf{N}(0, \sigma^2 R(\chi; {\theta_{\text{sp}}})), \\ \beta \mid \sigma^2 &\sim \mathsf{N}(\mu_\beta, \sigma^2 V_\beta), \quad \sigma^2 \sim \mathsf{IG}(a_\sigma, b_\sigma), \end{split} \end{aligned}

where we fix the noise-to-spatial variance ratio $\delta^2 = \tau^2 / \sigma^2$ , the process parameters ${\theta_{\text{sp}}}$ and the hyperparameters $\mu_\beta$ , $V_\beta$ , $a_\sigma$ and $b_\sigma$ . In this package, we use the Matern covariogram specified by spatial decay parameter $\phi$ and smoothness parameter $\nu$ i.e., ${\theta_{\text{sp}}}= \{\phi, \nu\}$ , given by $\begin{equation} R(s, s'; {\theta_{\text{sp}}}) = \frac{(\phi \lvert s - s' \rvert)^\nu}{2^{\nu - 1} \Gamma(\nu)} K_\nu (\phi \lvert s - s' \rvert)). \end{equation}$ We utilize a composition sampling strategy to sample the model parameters from their joint posterior distribution which can be written as $\begin{equation} p(\sigma^2, \beta, z \mid y) = p(\sigma^2 \mid y) \times p(\beta \mid \sigma^2, y) \times p(z \mid \beta, \sigma^2, y). \end{equation}$ We proceed by first sampling $\sigma^2$ from its marginal posterior, then given the samples of $\sigma^2$ , we sample $\beta$ and subsequently, we sample $z$ conditioned on the posterior samples of $\beta$ and $\sigma^2$ (Banerjee 2020). More details can be found in Zhang, Tang, and Banerjee (2025).

The function spLMexact() delivers samples from this posterior distribution conditional on fixed hyperparameters. For predictive stacking, use the function spLMstack().

Bayesian non-Gaussian spatial regression models

Analyzing non-Gaussian spatial data typically requires introducing spatial dependence in generalized linear models through the link function of an exponential family distribution. Let $y(s)$ be the outcome at location $s \in \mathcal{D}$ endowed with a probability law from the natural exponential family, which we denote by $\begin{equation} y(s) \sim \mathsf{EF}(x(s)^{ \scriptstyle \top }\beta + z(s); b, \psi_y) \end{equation}$ for some positive parameter $b > 0$ and unit log partition function $\psi_y$ . Fixed effects regression and spatial dependence, e.g., $x(s)^{{ \scriptstyle \top }}\beta + z(s)$ , is introduced in the natural parameter, where $x(s)$ is a $p \times 1$ vector of predictors referenced with respect to $s$ , $\beta$ is a $p \times 1$ vector of slopes measuring the trend, $z(s)$ is a zero-centered spatial process on $\mathcal{D}$ specified by a scale parameter $\sigma_z$ and a spatial correlation function $R(\cdot, \cdot ; {\theta_{\text{sp}}})$ with ${\theta_{\text{sp}}}$ consisting of spatial-temporal decay and smoothness parameters.

Unlike in Gaussian likelihoods, inference is considerably encumbered by the inability to analytically integrate out the random effects and reduce the dimension of the parameter space. Iterative algorithms such as Markov Chain Monte Carlo (MCMC), thus attempt to sample from a very high-dimensional posterior distribution, and convergence is often hampered by high auto-correlations and weakly identified spatial process parameters ${\theta_{\text{sp}}}$ .

This model is implemented using the function spGLMexact() when using fixed hyperparameters, and spGLMstack() when using predictive stacking.

We consider the following three point-referenced data -

Poisson count data: Here $b = 1$ and $\psi_y(t) = e^t$ . $\begin{equation} \begin{split} y(s_i) &\sim \mathsf{Poisson}(\lambda(s_i)), \quad i = 1, \dots, n.\\ \lambda(s_i) & = \exp(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation}$ This is accessed by setting family = "poisson" in the above functions.
Binomial count data: Here $b = m(s_i)$ for each $i$ and $\psi_y(t) = \log(1 + e^t)$ . $\begin{equation} \begin{split} y(s_i) &\sim \mathsf{Binomial}(m(s_i), \pi(s_i)), \quad i = 1, \dots, n.\\ \pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation}$ This is accessed by setting family = "binomial" in the above functions.
Binary data: Here $b = 1$ and $\psi_y(t) = \log(1 + e^t)$ . $\begin{equation} \begin{split} y(s_i) &\sim \mathsf{Bernoulli}(\pi(s_i)), \quad i = 1, \dots, n.\\ \pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation}$ This is accessed by setting family = "binary" in the above functions.

Following Bradley and Clinch (2024), we introduce a Bayesian hierarchical spatial model as $\begin{equation} \begin{split} y(s_i) \mid \beta, z, \xi & \sim \mathsf{EF}\left(x(s_i)^{ \scriptstyle \top }\beta + z(s_i) + \xi_i - \mu_i; b_i, \psi_y\right), i = 1, \ldots, n\\ \beta \mid \sigma^2_\beta &\sim \mathsf{N}(0, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \mathsf{IG}(\nu_\beta/2, \nu_\beta/2)\\ z \mid \sigma^2_z &\sim \mathsf{N}\left(0, \sigma^2_z R(\chi; {\theta_{\text{sp}}})\right), \quad \sigma^2_z \sim \mathsf{IG}(\nu_z/2, \nu_z/2),\\ \xi \mid \beta, z, \sigma^2_\xi, \alpha_\epsilon &\sim \mathsf{GCM_c}\left(\tilde{\mu}_\xi, H_\xi, \epsilon, \kappa_\xi; \psi_\xi\right), \end{split} \end{equation}$ where $\mu = (\mu_1, \ldots, \mu_n)^{ \scriptstyle \top }$ denotes the discrepancy parameter. We fix the spatial process parameters ${\theta_{\text{sp}}}$ , the boundary adjustment parameter $\epsilon$ and the hyperparameters $V_\beta$ , $\nu_\beta$ , $\nu_z$ and $\sigma^2_\xi$ . The term $\xi$ is known as the fine-scale variation term which is given a conditional generalized conjugate multivariate distribution ( $\mathrm{GCM_c}$ ) as prior. For details, see Pan et al. (2025).

Bayesian non-Gaussian spatial-temporal regression model

We consider a rich family of Bayesian spatial-temporal model with spatially-temporally varying regression coefficients. Suppose $\ell = (s, t)$ , with location $s \in \mathcal{D}$ and time $t \in \mathcal{T}$ , denote a spatial-temporal coordinate in $\mathcal{L} = \mathcal{D} \times \mathcal{T}$ .

Let $\mathcal{L} = \{\ell_1, \ldots, \ell_n\}$ be a fixed set of $n$ distinct space-time coordinates in $\mathcal{D}$ , where $y(\mathcal{L}) = (y(\ell_1), \dots, y(\ell_n))^\top$ , which we simply denote by $y$ , is the vector of observed outcomes, each distributed as a member of the natural exponential family with log partition function $\psi_y$ . Suppose, $x(\ell_i)$ is a $p\times 1$ vector of predictors, $\beta$ is the corresponding $p \times 1$ vector of slopes (fixed effects), $\tilde{x}(\ell_i)$ is $r\times 1$ ( $r \leq p$ ) consisting of predictors in $x(\ell_i)$ that are posited to have spatially-temporally varying regression coefficients $z(\ell_i) = (z_1(\ell_i), \ldots, z_r(\ell_i))^\top$ , where each $z_j(\ell_i)$ is a spatially-temporally varying coefficient for the predictor $\tilde{x}_j(\ell_i)$ , $\xi_i$ is a fine-scale variation term and $\mu_i$ is the discrepancy parameter (see above). We introduce spatially-temporally varying coefficients in $\eta(\ell)$ as $\begin{equation} \begin{split} y(\ell_i) &\mid \beta, z(\ell_i), \xi_i, \mu_i \overset{\text{ind}}{\sim} \mathsf{EF}\left(\eta(\ell_i) + \xi_i - \mu_i; b_i, \psi_y \right), \ i=1,\ldots,n\;,\\ \eta(\ell) &= x(\ell)^{ \scriptstyle \top }\beta + \tilde{x}(\ell)^{{ \scriptstyle \top }}z(\ell), \quad \beta \mid \sigma^2_\beta, \mu_\beta, V_\beta \sim \mathsf{N}(\mu_\beta, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \pi_\beta(\sigma^2_\beta) \;,\\ z(\ell) &\mid \theta_z, {\theta_{\text{sp}}}\sim \mathsf{GP}(0, C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z))\;,\quad \theta_z \sim \pi_{z}(\theta_z)\;, \\ \xi &\mid \beta, z, \mu, \alpha_\epsilon, \kappa_\epsilon, \sigma^2_\xi \sim \mathsf{GCM_c}(\tilde{\mu}_\xi, H_\xi, \alpha_\xi, \kappa_\xi, D_\xi, \pi_\xi; \psi_\xi), \ \sigma^2_\xi \sim \pi_\xi(\sigma^2_\xi), \ p(\mu) \propto 1 \;, \end{split} \end{equation}$ where $z(\ell) = (z_1(\ell), \ldots, z_r(\ell))^{ \scriptstyle \top }$ is a multivariate Gaussian process with a separable cross-covariance function $C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z)$ , characterized by process parameters ${\theta_{\text{sp}}}$ which controls the within-process spatial-temporal correlation, and $\theta_z$ which controls the between-process covariance matrix . Given $\theta_z$ , the $nr \times 1$ vector $z = (z_1^{ \scriptstyle \top }, \ldots, z_r^{ \scriptstyle \top })^{ \scriptstyle \top }$ , where $z_j = (z_j(\ell_1), \ldots, z_j(\ell_n))^{ \scriptstyle \top }$ for $j = 1, \ldots, r$ , follows a multivariate Gaussian distribution with mean $0$ and $nr \times nr$ covariance matrix $C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z)$ .

This model is implemented by the function stvcGLMexact() under fixed hyperparameters, and stvcGLMstack() when using predictive stacking. We also implement different specifications for $C_z$ in this package as follows.

Independent spatial-temporal process: We consider $r$ Gaussian spatial-temporal processes $\begin{equation}\label{eq:z_ind} \begin{split} \text{Independent processes: } z_j(\ell) \mid\sigma_{z_j}^2, {\theta_{\text{sp}}}_j & \overset{\text{ind}}{\sim} \mathsf{GP}(0, \sigma_{z_j}^2 R_j(\cdot, \cdot; {\theta_{\text{sp}}}_j)),\\ \sigma_{z_j}^2 & \sim \mathsf{IG}(\nu_{z_j}/2, \nu_{z_j}/2), \quad j = 1, \ldots, r, \end{split} \end{equation}$ where $\sigma_{z_j}^2$ is the variance parameter corresponding to process $z_j(\ell)$ . This corresponds to the covariance matrix $C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) = \oplus_{j = 1}^r \sigma_{z_j}^2 R_j({\theta_{\text{sp}}}_j)$ with ${\theta_{\text{sp}}}= \{ {\theta_{\text{sp}}}_j : j = 1, \ldots, r\}$ , where ${\theta_{\text{sp}}}_j$ denotes covariance kernel parameters for the $j$ th process, and $\theta_z = \{\sigma^2_{z_1}, \ldots, \sigma^2_{z_r}\}$ . This is accessed by setting the option process.type = "independent" in the above functions.
Independent shared spatial-temporal process: This corresponds to the above with ${\theta_{\text{sp}}}_j = {\theta_{\text{sp}}}$ and $\sigma^2_{zj} = \sigma^2_z$ for all $j = 1, \ldots, r$ . This is accessed by setting the option process.type = "independent.shared" in the above functions.
Multivariate spatial-temporal process: We can introduce dependence among the elements of the $r\times 1$ vector $z(\ell)$ using $\begin{equation}\label{eq:multi_z} \text{Multivariate process: }z(\ell) \mid\Sigma \sim \mathsf{GP}(0, R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma), \quad \Sigma \sim \mathsf{IW}(\nu_z + 2r, \Psi)\;, \end{equation}$ where $\mathcal{GP} (0, R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma)$ is an $r\times 1$ multivariate Gaussian process with matrix-valued cross-covariance function $R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma$ and $\Sigma$ is an $r \times r$ positive definite random matrix. This corresponds to the spatial-temporal covariance matrix $C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) = \Sigma \otimes R({\theta_{\text{sp}}})$ with $\theta_z = \Sigma$ . We place an inverse-Wishart prior on the scale parameter with shape $\nu_z + 2r$ and $r\times r$ positive definite scale matrix $\Psi$ , given by $\pi(\theta_z) = \mathsf{IW}(\Sigma \mid\nu_z + 2r, \Psi)$ . This is accessed by setting the option process.type = "multivariate" in the above functions.

Predictive stacking

Following Yao et al. (2018), we consider a set of candidate models based on a grid of values of the parameters in $\{ {\theta_{\text{sp}}}, \delta^2 \}$ for the Gaussian case, and $\{ {\theta_{\text{sp}}}, \epsilon \}$ for the non-Gaussian case, as will be supplied by the user. We build a set of candidate models based on the Cartesian product of the collection of values for each individual parameter as $\mathcal{M} = \{M_1, \ldots, M_G\}$ . Then, for each $g = 1, \ldots, G$ , we sample from the posterior distribution $p(\sigma^2, \beta, z \mid y, M_g)$ under the model $M_g$ and find leave-one-out predictive densities $p(y_i \mid y_{-i}, M_g)$ . Then we solve the optimization problem $\begin{equation} \begin{split} \max_{w_1, \ldots, w_G}& \, \frac{1}{n} \sum_{i = 1}^n \log \sum_{g = 1}^G w_g p(y_i \mid y_{-i}, M_g) \\ \text{subject to} & \quad w_g \geq 0, \sum_{g = 1}^G w_g = 1 \end{split} \end{equation}$ to find the optimal stacking weights $\hat{w}_1, \ldots, \hat{w}_G$ . After obtaining the optimal stacking weights, posterior inference of any quantity of interest subsequently proceed from the stacked posterior, $\begin{equation} \tilde{p}(\cdot \mid y) = \sum_{g = 1}^G \hat{w}_g p(\cdot \mid y, M_g). \end{equation}$

References

Banerjee, Sudipto. 2020. “Modeling Massive Spatial Datasets Using a Conjugate Bayesian Linear Modeling Framework.” Spatial Statistics 37: 100417. https://doi.org/10.1016/j.spasta.2020.100417.

Bradley, Jonathan R., and Madelyn Clinch. 2024. “Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models.” Journal of Computational and Graphical Statistics 0 (0): 1–17. https://doi.org/10.1080/10618600.2024.2365728.

Pan, Soumyakanti, Lu Zhang, Jonathan R. Bradley, and Sudipto Banerjee. 2025. “Bayesian Inference for Spatial-Temporal Non-Gaussian Data Using Predictive Stacking.” https://doi.org/10.48550/arXiv.2406.04655.

Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13 (3): 917–1007. https://doi.org/10.1214/17-BA1091.

Zhang, Lu, Wenpin Tang, and Sudipto Banerjee. 2025. “Bayesian Geostatistics Using Predictive Stacking.” https://doi.org/10.48550/arXiv.2304.12414.