Skip to contents

Introduction

Geostatistics refers to the study of a spatially distributed variable of interest, which in theory is defined at every point over a bounded study region of interest. Statistical modelling and analysis for spatially oriented point-referenced outcomes play a crucial role in diverse scientific applications such as earth and environmental sciences, ecology, epidemiology, and economics. With the advent of Markov chain Monte Carlo (MCMC) algorithms, Bayesian hierarchical models have gained massive popularity in analyzing such point-referenced or, geostatistical data. These models involve latent spatial processes characterized by spatial process parameters, which besides lacking substantive relevance in scientific contexts, are also weakly identified and hence, impedes convergence of MCMC algorithms. Thus, even for moderately large datasets (~10310^3 or higher), the computation for MCMC becomes too onerous for practical use.

We introduce the R package spStack that implements Bayesian inference for a class of geostatistical models, where we obviate the issues mentioned by sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters and, subsequently assimilate inference from these individual posterior distributions using Bayesian predictive stacking. Besides delivering competitive predictive performance as compared to fully Bayesian inference using MCMC, our proposed algorithm is embarrassingly parallel, thus drastically improves runtime and elevating the utility of the package for a diverse group of practitioners with limited computational resources at their disposal. This package, to the best of our knowledge, is the first to implement stacking for Bayesian analysis of spatial data.

Technical details surrounding the methodology can be found in the articles Zhang, Tang, and Banerjee (2024) which discuss the case where the distribution of the point-referenced outcomes are Gaussian, and, in Pan et al. (2024) where the case of non-Gaussian outcomes is explored. The code for this package is written primarily in C/C++ with additional calls to FORTRAN routines for optimized linear algebra operations. We leverage the F77_NAME macro to interface with legacy FORTRAN functions in conjunction with efficient matrix computation libraries such as BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) to implement our stacking algorithm.

The remainder of the vignette evolves as follows - the next two sections discuss Bayesian hierarchical spatial models for Gaussian and non-Gaussian outcomes, which is followed by a section providing brief details on predictive stacking and a section dedicated for illustration of functions in the package.

Bayesian Gaussian spatial regression models

Let χ={s1,,sn}𝒟\chi = \{s_1, \ldots, s_n\} \in \mathcal{D} be a be a set of nn spatial locations yielding measurements y=(y1,,yn)y = (y_1, \ldots, y_n)^{ \scriptstyle \top } with known values of predictors at these locations collected in the n×pn \times p full rank matrix X=[x(s1),,x(sn)]X = [x(s_1), \ldots, x(s_n)]^{ \scriptstyle \top }. A customary geostatistical model is yi=x(si)β+z(si)+ϵi,i=1,,n,\begin{equation} y_i = x(s_i)^{ \scriptstyle \top }\beta + z(s_i) + \epsilon_i, \quad i = 1, \ldots, n, \end{equation} where β\beta is the p×1p \times 1 vector of slopes, z(s)GP(0,R(,;θsp))z(s) \sim \mathrm{GP}(0, R(\cdot, \cdot; {\theta_{\text{sp}}})) is a zero-centered spatial Gaussian process on 𝒟\mathcal{D} with spatial correlation function R(,;θsp)R(\cdot, \cdot; {\theta_{\text{sp}}}) characterized by process parameters θsp{\theta_{\text{sp}}}, σ2\sigma^2 is the spatial variance parameter (“partial sill”) and ϵi𝒩(0,τ2),i=1,,n\epsilon_i \sim \mathcal{N}(0, \tau^2), i = 1, \ldots, n are i.i.d. with variance τ2\tau^2 (“nugget”) capturing measurement error. The spatial process z()z(\cdot) is assumed to be independent of the measurement errors {ϵi,i=1,,n}\{\epsilon_i, i = 1, \ldots, n\}. Let z=(z(s1),,z(sn))z = (z(s_1), \ldots, z(s_n))^{ \scriptstyle \top } denotes the realization of the spatial process on χ\chi and n×nn \times n correlation matrix R(χ;θsp)=(R(si,sjθsp))1i,jnR(\chi; {\theta_{\text{sp}}}) = (R(s_i, s_j {\theta_{\text{sp}}}))_{1 \leq i,j \leq n}. We build a conjugate Bayesian hierarchical spatial model, yz,β,σ2N(Xβ+z,δ2σ2In),zσ2N(0,σ2R(χ;θsp)),βσ2N(μβ,σ2Vβ),σ2IG(aσ,bσ),\begin{equation} \begin{split} y \mid z, \beta, \sigma^2 &\sim N(X\beta + z, \delta^2 \sigma^2 I_n), \\ z \mid \sigma^2 &\sim N(0, \sigma^2 R(\chi; {\theta_{\text{sp}}})), \\ \beta \mid \sigma^2 &\sim N(\mu_\beta, \sigma^2 V_\beta), \quad \sigma^2 \sim \mathrm{IG}(a_\sigma, b_\sigma), \end{split} \end{equation} where we fix the noise-to-spatial variance ratio δ2=τ2/σ2\delta^2 = \tau^2 / \sigma^2, the process parameters θsp{\theta_{\text{sp}}} and the hyperparameters μβ\mu_\beta, VβV_\beta, aσa_\sigma and bσb_\sigma. In this package, we use the Matern covariogram specified by spatial decay parameter ϕ\phi and smoothness parameter ν\nu i.e., θsp={ϕ,ν}{\theta_{\text{sp}}}= \{\phi, \nu\}, given by R(s,s;θsp)=(ϕ|ss|)ν2ν1Γ(ν)Kν(ϕ|ss|)).\begin{equation} R(s, s'; {\theta_{\text{sp}}}) = \frac{(\phi \lvert s - s' \rvert)^\nu}{2^{\nu - 1} \Gamma(\nu)} K_\nu (\phi \lvert s - s' \rvert)). \end{equation} We utilize a composition sampling strategy to sample the model parameters from their joint posterior distribution which can be written as p(σ2,β,zy)=p(σ2y)×p(βσ2,y)×p(zβ,σ2,y).\begin{equation} p(\sigma^2, \beta, z \mid y) = p(\sigma^2 \mid y) \times p(\beta \mid \sigma^2, y) \times p(z \mid \beta, \sigma^2, y). \end{equation} We proceed by first sampling σ2\sigma^2 from its marginal posterior, then given the samples of σ2\sigma^2, we sample β\beta and subsequently, we sample zz conditioned on the posterior samples of β\beta and σ2\sigma^2(Banerjee 2020). More details can be found in Zhang, Tang, and Banerjee (2024).

The function spLMexact() delivers samples from this posterior distribution conditional on fixed hyperparameters. For predictive stacking, use the function spLMstack().

Bayesian non-Gaussian spatial regression models

Analyzing non-Gaussian spatial data typically requires introducing spatial dependence in generalized linear models through the link function of an exponential family distribution. Let y(s)y(s) be the outcome at location s𝒟s \in \mathcal{D} endowed with a probability law from the natural exponential family, which we denote by y(s)EF(x(s)β+z(s);b,ψy)\begin{equation} y(s) \sim \mathrm{EF}(x(s)^{ \scriptstyle \top }\beta + z(s); b, \psi_y) \end{equation} for some positive parameter b>0b > 0 and unit log partition function ψy\psi_y. Fixed effects regression and spatial dependence, e.g., x(s)β+z(s)x(s)^{{ \scriptstyle \top }}\beta + z(s), is introduced in the natural parameter, where x(s)x(s) is a p×1p \times 1 vector of predictors referenced with respect to ss, β\beta is a p×1p \times 1 vector of slopes measuring the trend, z(s)z(s) is a zero-centered spatial process on 𝒟\mathcal{D} specified by a scale parameter σz\sigma_z and a spatial correlation function R(,;θsp)R(\cdot, \cdot ; {\theta_{\text{sp}}}) with θsp{\theta_{\text{sp}}} consisting of spatial-temporal decay and smoothness parameters.

Unlike in Gaussian likelihoods, inference is considerably encumbered by the inability to analytically integrate out the random effects and reduce the dimension of the parameter space. Iterative algorithms such as Markov Chain Monte Carlo (MCMC), thus attempt to sample from a very high-dimensional posterior distribution, and convergence is often hampered by high auto-correlations and weakly identified spatial process parameters θsp{\theta_{\text{sp}}}.

This model is implemented using the function spGLMexact() when using fixed hyperparameters, and spGLMstack() when using predictive stacking.

We consider the following three cases -

  1. Point-referenced Poisson count data: Here b=1b = 1 and ψy(t)=et\psi_y(t) = e^t. y(si)Poisson(λ(si)),i=1,,n.λ(si)=exp(x(si)β+z(si))\begin{equation} \begin{split} y(s_i) &\sim \mathrm{Poisson}(\lambda(s_i)), \quad i = 1, \dots, n.\\ \lambda(s_i) & = \exp(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation} This is accessed by setting family = "poisson" in the above functions.

  2. Point-referenced binomial count data: Here b=m(si)b = m(s_i) for each ii and ψy(t)=log(1+et)\psi_y(t) = \log(1 + e^t). y(si)Binomial(m(si),π(si)),i=1,,n.π(si)=ilogit(x(si)β+z(si))\begin{equation} \begin{split} y(s_i) &\sim \mathrm{Binomial}(m(s_i), \pi(s_i)), \quad i = 1, \dots, n.\\ \pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation} This is accessed by setting family = "binomial" in the above functions.

  3. Point-referenced binary data: Here b=1b = 1 and ψy(t)=log(1+et)\psi_y(t) = \log(1 + e^t). y(si)Bernoulli(π(si)),i=1,,n.π(si)=ilogit(x(si)β+z(si))\begin{equation} \begin{split} y(s_i) &\sim \mathrm{Bernoulli}(\pi(s_i)), \quad i = 1, \dots, n.\\ \pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta + z(s_i)) \end{split} \end{equation} This is accessed by setting family = "binary" in the above functions.

Following Bradley and Clinch (2024), we introduce a Bayesian hierarchical spatial model as y(si)β,z,ξEF(x(si)β+z(si)+ξiμi;bi,ψy),i=1,,nβσβ2N(0,σβ2Vβ),σβ2IG(νβ/2,νβ/2)zσz2N(0,σz2R(χ;θsp)),σz2IG(νz/2,νz/2),ξβ,z,σξ2,αϵGCMc(μ̃ξ,Hξ,ϵ,κξ;ψξ),\begin{equation} \begin{split} y(s_i) \mid \beta, z, \xi & \sim \mathrm{EF}\left(x(s_i)^{ \scriptstyle \top }\beta + z(s_i) + \xi_i - \mu_i; b_i, \psi_y\right), i = 1, \ldots, n\\ \beta \mid \sigma^2_\beta &\sim N(0, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \mathrm{IG}(\nu_\beta/2, \nu_\beta/2)\\ z \mid \sigma^2_z &\sim N\left(0, \sigma^2_z R(\chi; {\theta_{\text{sp}}})\right), \quad \sigma^2_z \sim \mathrm{IG}(\nu_z/2, \nu_z/2),\\ \xi \mid \beta, z, \sigma^2_\xi, \alpha_\epsilon &\sim \mathrm{GCM_c}\left(\tilde{\mu}_\xi, H_\xi, \epsilon, \kappa_\xi; \psi_\xi\right), \end{split} \end{equation} where μ=(μ1,,μn)\mu = (\mu_1, \ldots, \mu_n)^{ \scriptstyle \top } denotes the discrepancy parameter. We fix the spatial process parameters θsp{\theta_{\text{sp}}}, the boundary adjustment parameter ϵ\epsilon and the hyperparameters VβV_\beta, νβ\nu_\beta, νz\nu_z and σξ2\sigma^2_\xi. The term ξ\xi is known as the fine-scale variation term which is given a conditional generalized conjugate multivariate distribution (GCMc\mathrm{GCM_c}) as prior. For details, see Pan et al. (2024).

Bayesian non-Gaussian spatial-temporal regression model

We consider a rich family of Bayesian spatial-temporal model with spatially-temporally varying regression coefficients. Suppose =(s,t)\ell = (s, t), with location s𝒟s \in \mathcal{D} and time t𝒯t \in \mathcal{T}, denote a spatial-temporal coordinate in =𝒟×𝒯\mathcal{L} = \mathcal{D} \times \mathcal{T}.

Let ={1,,n}\mathcal{L} = \{\ell_1, \ldots, \ell_n\} be a fixed set of nn distinct space-time coordinates in 𝒟\mathcal{D}, where y()=(y(1),,y(n))y(\mathcal{L}) = (y(\ell_1), \dots, y(\ell_n))^\top, which we simply denote by yy, is the vector of observed outcomes, each distributed as a member of the natural exponential family with log partition function ψy\psi_y. Suppose, x(i)x(\ell_i) is a p×1p\times 1 vector of predictors, β\beta is the corresponding p×1p \times 1 vector of slopes (fixed effects), x̃(i)\tilde{x}(\ell_i) is r×1r\times 1 (rpr \leq p) consisting of predictors in x(i)x(\ell_i) that are posited to have spatially-temporally varying regression coefficients z(i)=(z1(i),,zr(i))z(\ell_i) = (z_1(\ell_i), \ldots, z_r(\ell_i))^\top, where each zj(i)z_j(\ell_i) is a spatially-temporally varying coefficient for the predictor x̃j(i)\tilde{x}_j(\ell_i), ξi\xi_i is a fine-scale variation term and μi\mu_i is the discrepancy parameter (see above). We introduce spatially-temporally varying coefficients in η()\eta(\ell) as y(i)β,z(i),ξi,μiindEF(η(i)+ξiμi;bi,ψy),i=1,,n,η()=x()β+x̃()z(),βσβ2,μβ,VβN(μβ,σβ2Vβ),σβ2πβ(σβ2),z()θz,θspGP(0,Cz(,;θsp,θz)),θzπz(θz),ξβ,z,μ,αϵ,κϵ,σξ2GCMc(μ̃ξ,Hξ,αξ,κξ,Dξ,πξ;ψξ),σξ2πξ(σξ2),p(μ)1,\begin{equation} \begin{split} y(\ell_i) &\mid \beta, z(\ell_i), \xi_i, \mu_i \overset{\text{ind}}{\sim} \mathrm{EF}\left(\eta(\ell_i) + \xi_i - \mu_i; b_i, \psi_y \right), \ i=1,\ldots,n\;,\\ \eta(\ell) &= x(\ell)^{ \scriptstyle \top }\beta + \tilde{x}(\ell)^{{ \scriptstyle \top }}z(\ell), \quad \beta \mid \sigma^2_\beta, \mu_\beta, V_\beta \sim \mathrm{N}(\mu_\beta, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \pi_\beta(\sigma^2_\beta) \;,\\ z(\ell) &\mid \theta_z, {\theta_{\text{sp}}}\sim \mathrm{GP}(0, C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z))\;,\quad \theta_z \sim \pi_{z}(\theta_z)\;, \\ \xi &\mid \beta, z, \mu, \alpha_\epsilon, \kappa_\epsilon, \sigma^2_\xi \sim \mathrm{GCM}_c(\tilde{\mu}_\xi, H_\xi, \alpha_\xi, \kappa_\xi, D_\xi, \pi_\xi; \psi_\xi), \ \sigma^2_\xi \sim \pi_\xi(\sigma^2_\xi), \ p(\mu) \propto 1 \;, \end{split} \end{equation} where z()=(z1(),,zr())z(\ell) = (z_1(\ell), \ldots, z_r(\ell))^{ \scriptstyle \top } is a multivariate Gaussian process with a separable cross-covariance function Cz(,;θsp,θz)C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z), characterized by process parameters θsp{\theta_{\text{sp}}} which controls the within-process spatial-temporal correlation, and θz\theta_z which controls the between-process covariance matrix . Given θz\theta_z, the nr×1nr \times 1 vector z=(z1,,zr)z = (z_1^{ \scriptstyle \top }, \ldots, z_r^{ \scriptstyle \top })^{ \scriptstyle \top }, where zj=(zj(1),,zj(n))z_j = (z_j(\ell_1), \ldots, z_j(\ell_n))^{ \scriptstyle \top } for j=1,,rj = 1, \ldots, r, follows a multivariate Gaussian distribution with mean 00 and nr×nrnr \times nr covariance matrix Cz(;θsp,θz)C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z).

This model is implemented by the function stvcGLMexact() under fixed hyperparameters, and stvcGLMstack() when using predictive stacking. We also implement different specifications for CzC_z in this package as follows.

  1. Independent spatial-temporal process: We consider rr Gaussian spatial-temporal processes Independent processes: zj()σzj2,θspjindGP(0,σzj2Rj(,;θspj)),σzj2IG(νzj/2,νzj/2),j=1,,r,\begin{equation}\label{eq:z_ind} \begin{split} \text{Independent processes: } z_j(\ell) \mid\sigma_{z_j}^2, {\theta_{\text{sp}}}_j & \overset{\text{ind}}{\sim} \mathrm{GP}(0, \sigma_{z_j}^2 R_j(\cdot, \cdot; {\theta_{\text{sp}}}_j)),\\ \sigma_{z_j}^2 & \sim \mathrm{IG}(\nu_{z_j}/2, \nu_{z_j}/2), \quad j = 1, \ldots, r, \end{split} \end{equation} where σzj2\sigma_{z_j}^2 is the variance parameter corresponding to process zj()z_j(\ell). This corresponds to the covariance matrix Cz(;θsp,θz)=j=1rσzj2Rj(θspj)C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) = \oplus_{j = 1}^r \sigma_{z_j}^2 R_j({\theta_{\text{sp}}}_j) with θsp={θspj:j=1,,r}{\theta_{\text{sp}}}= \{ {\theta_{\text{sp}}}_j : j = 1, \ldots, r\}, where θspj{\theta_{\text{sp}}}_j denotes covariance kernel parameters for the jjth process, and θz={σz12,,σzr2}\theta_z = \{\sigma^2_{z_1}, \ldots, \sigma^2_{z_r}\}. This is accessed by setting the option process.type = "independent" in the above functions.

  2. Independent shared spatial-temporal process: This corresponds to the above with θspj=θsp{\theta_{\text{sp}}}_j = {\theta_{\text{sp}}} and σzj2=σz2\sigma^2_{zj} = \sigma^2_z for all j=1,,rj = 1, \ldots, r. This is accessed by setting the option process.type = "independent.shared" in the above functions.

  3. Multivariate spatial-temporal process: We can introduce dependence among the elements of the r×1r\times 1 vector z()z(\ell) using Multivariate process: z()ΣGP(0,R(,;θsp)Σ),ΣIW(νz+2r,Ψ),\begin{equation}\label{eq:multi_z} \text{Multivariate process: }z(\ell) \mid\Sigma \sim \mathrm{GP}(0, R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma), \quad \Sigma \sim \mathrm{IW}(\nu_z + 2r, \Psi)\;, \end{equation} where 𝒢𝒫(0,R(,;θsp)Σ)\mathcal{GP} (0, R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma) is an r×1r\times 1 multivariate Gaussian process with matrix-valued cross-covariance function R(,;θsp)ΣR(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma and Σ\Sigma is an r×rr \times r positive definite random matrix. This corresponds to the spatial-temporal covariance matrix Cz(;θsp,θz)=ΣR(θsp)C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) = \Sigma \otimes R({\theta_{\text{sp}}}) with θz=Σ\theta_z = \Sigma. We place an inverse-Wishart prior on the scale parameter with shape νz+2r\nu_z + 2r and r×rr\times r positive definite scale matrix Ψ\Psi, given by π(θz)=IW(Σνz+2r,Ψ)\pi(\theta_z) = \mathrm{IW}(\Sigma \mid\nu_z + 2r, \Psi). This is accessed by setting the option process.type = "multivariate" in the above functions.

Predictive stacking

Following Yao et al. (2018), we consider a set of candidate models based on a grid of values of the parameters in {θsp,δ2}\{ {\theta_{\text{sp}}}, \delta^2 \} for the Gaussian case, and {θsp,ϵ}\{ {\theta_{\text{sp}}}, \epsilon \} for the non-Gaussian case, as will be supplied by the user. We build a set of candidate models based on the Cartesian product of the collection of values for each individual parameter as ={M1,,MG}\mathcal{M} = \{M_1, \ldots, M_G\}. Then, for each g=1,,Gg = 1, \ldots, G, we sample from the posterior distribution p(σ2,β,zy,Mg)p(\sigma^2, \beta, z \mid y, M_g) under the model MgM_g and find leave-one-out predictive densities p(yiyi,Mg)p(y_i \mid y_{-i}, M_g). Then we solve the optimization problem maxw1,,wG1ni=1nlogg=1Gwgp(yiyi,Mg)subject towg0,g=1Gwg=1\begin{equation} \begin{split} \max_{w_1, \ldots, w_G}& \, \frac{1}{n} \sum_{i = 1}^n \log \sum_{g = 1}^G w_g p(y_i \mid y_{-i}, M_g) \\ \text{subject to} & \quad w_g \geq 0, \sum_{g = 1}^G w_g = 1 \end{split} \end{equation} to find the optimal stacking weights ŵ1,,ŵG\hat{w}_1, \ldots, \hat{w}_G. After obtaining the optimal stacking weights, posterior inference of any quantity of interest subsequently proceed from the stacked posterior, p̃(y)=g=1Gŵgp(y,Mg).\begin{equation} \tilde{p}(\cdot \mid y) = \sum_{g = 1}^G \hat{w}_g p(\cdot \mid y, M_g). \end{equation}

References

Banerjee, Sudipto. 2020. “Modeling Massive Spatial Datasets Using a Conjugate Bayesian Linear Modeling Framework.” Spatial Statistics 37: 100417. https://doi.org/10.1016/j.spasta.2020.100417.
Bradley, Jonathan R., and Madelyn Clinch. 2024. “Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Hierarchical Models.” Journal of Computational and Graphical Statistics 0 (0): 1–17. https://doi.org/10.1080/10618600.2024.2365728.
Pan, Soumyakanti, Lu Zhang, Jonathan R. Bradley, and Sudipto Banerjee. 2024. “Bayesian Inference for Spatial-Temporal Non-Gaussian Data Using Predictive Stacking.” https://arxiv.org/abs/2406.04655.
Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13 (3): 917–1007. https://doi.org/10.1214/17-BA1091.
Zhang, Lu, Wenpin Tang, and Sudipto Banerjee. 2024. “Bayesian Geostatistics Using Predictive Stacking.” https://arxiv.org/abs/2304.12414.