Publications

Parallel space-time likelihood optimization for air pollution prediction on large-scale systems

Published in Proceedings of the Platform for Advanced Scientific Computing Conference (PASC), 2022

Gaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a widely applied space-time modeling method for large-scale systems, using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required in modeling and prediction operations using maximum likelihood estimation (MLE), i.e., the Cholesky factorization of the Gaussian space-time covariance matrix. At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, where nodes in each sub-communicator perform a single MLE iteration at a time. We evaluate the performance and the accuracy of the proposed implementation using synthetic datasets and a real particulate matter (PM) dataset illustrating the application of the technique to air pollution. We achieve up to 24.45, 49.70, 100.06, 189.67, 369.22, and 757.16 TFLOPS/s using 32, 64, 128, 256, 512, and 1024 nodes, respectively, of a Cray XC40 system, with an average of 60% of the peak performance on 1024 nodes with 490K problem size.

Recommended citation: Salvaña, M. L. O., Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2022). "Parallel space-time likelihood optimization for air pollution prediction on large-scale systems." Proceedings of the Platform for Advanced Scientific Computing Conference (to appear). http://marysalvana.github.io/files/pasc.pdf

Spatio-temporal cross-covariance functions under the Lagrangian framework with multiple advections

Published in Journal of the American Statistical Association, 2022

When analyzing the spatio-temporal dependence in most environmental and earth sciences variables such as pollutant concentrations at different levels of the atmosphere, a special property is observed: the covariances and cross-covariances are stronger in certain directions. This property is attributed to the presence of natural forces, such as wind, which cause the transport and dispersion of these variables. This spatio-temporal dynamics prompts the integration of the Lagrangian reference frame to any Gaussian spatio-temporal geostatistical model. Under this modeling framework, a whole new class was birthed and is known as the class of spatio-temporal covariance functions under the Lagrangian framework, with several developments already established in the univariate setting, in both stationary and nonstationary formulations, but less so in the multivariate case. Despite the many advances in this modeling approach, efforts have yet to be directed to probing the case for the use of multiple advections, especially when several variables are involved. Accounting for multiple advections makes the Lagrangian framework a more viable approach in modeling realistic multivariate transport scenarios. In this work, we establish a class of Lagrangian spatio-temporal cross-covariance functions with multiple advections, study its properties, and demonstrate its use on a bivariate pollutant dataset of particulate matter in Saudi Arabia.

Recommended citation: Salvaña, M. L. O., Lenzi, A., & Genton, M. G. (2022). "Spatio-temporal cross-covariance functions under the Lagrangian framework with multiple advections." Journal of the American Statistical Association (to appear). http://marysalvana.github.io/files/jasa_arxiv.pdf

Lagrangian spatio-temporal nonstationary covariance functions

Published in Advances in Contemporary Statistics and Econometrics, 2021

The Lagrangian reference frame has been used to model spatio-temporal dependence of purely spatial second-order stationary random fields that are being transported. This modeling paradigm involves transforming a purely spatial process to spatio-temporal by introducing a transformation in the spatial coordinates. Recently, it has been used to capture dependence in space and time of transported purely spatial random fields with second-order nonstationarity. However, under this modeling framework, the presence of mechanisms enforcing second-order nonstationary behavior introduces considerable challenges in parameter estimation. To address these, we propose a new estimation methodology which includes modeling the second-order nonstationarity parameters by means of thin plate splines and estimating all the parameters via two-step maximum likelihood estimation. In addition, through numerical experiments, we tackle the consequences of model misspecification. That is, we discuss the implications, both in the stationary and nonstationary cases, of fitting Lagrangian spatio-temporal covariance functions to data generated from non-Lagrangian models, and vice versa. Lastly, we apply the Lagrangian models and the new estimation technique to analyze particulate matter concentrations over Saudi Arabia.

Recommended citation: Salvaña, M. L. O., & Genton, M. G. (2021). "Lagrangian spatio-temporal nonstationary covariance functions." Book Chapter in Advances in Contemporary Statistics and Econometrics, Festschrift for Prof. C. Thomas-Agnan. 427-447. http://marysalvana.github.io/files/2021_Book_AdvancesInContemporaryStatisti.pdf

High performance multivariate spatial modeling for geostatistical data on manycore systems

Published in IEEE Transactions on Parallel and Distributed Systems, 2021

Modeling and inferring spatial relationships and predicting missing values of environmental data are some of the main tasks of geospatial statisticians. These routine tasks are accomplished using multivariate geospatial models and the cokriging technique. The latter requires the evaluation of the expensive Gaussian log-likelihood function, which has impeded the adoption of multivariate geospatial models for large multivariate spatial datasets. However, this large-scale cokriging challenge provides a fertile ground for supercomputing implementations for the geospatial statistics community as it is paramount to scale computational capability to match the growth in environmental data coming from the widespread use of different data collection technologies. In this article, we develop and deploy large-scale multivariate spatial modeling and inference on parallel hardware architectures. To tackle the increasing complexity in matrix operations and the massive concurrency in parallel systems, we leverage low-rank matrix approximation techniques with task-based programming models and schedule the asynchronous computational tasks using a dynamic runtime system. The proposed framework provides both the dense and the approximated computations of the Gaussian log-likelihood function. It demonstrates accuracy robustness and performance scalability on a variety of computer systems. Using both synthetic and real datasets, the low-rank matrix approximation shows better performance compared to exact computation, while preserving the application requirements in both parameter estimation and prediction accuracy. We also propose a novel algorithm to assess the prediction accuracy after the online parameter estimation. The algorithm quantifies prediction performance and provides a benchmark for measuring the efficiency and accuracy of several approximation techniques in multivariate spatial modeling.

Recommended citation: Salvaña, M. L. O., Abdulah, S., Huang, H., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2021). "High performance multivariate spatial modeling for geostatistical data on manycore systems." IEEE Transactions on Parallel and Distributed Systems. 32(11), 2719-2733. http://marysalvana.github.io/files/tpds.pdf

Nonstationary cross-covariance functions for multivariate spatio-temporal random fields

Published in Spatial Statistics, 2020

In multivariate spatio-temporal analysis, we are faced with the formidable challenge of specifying a valid spatio-temporal crosscovariance function, either directly or through the construction of processes. This task is difficult as these functions should yield positive definite covariance matrices. In recent years, we have seen a flourishing of methods and theories on constructing spatio-temporal cross-covariance functions satisfying the positive definiteness requirement. A subset of those techniques produced spatio-temporal cross-covariance functions possessing the additional feature of nonstationarity. Here we provide a review of the state-of-the-art methods and technical progress regarding model construction. In addition, we introduce a rich class of multivariate spatio-temporal asymmetric nonstationary models stemming from the Lagrangian framework. We demonstrate the capabilities of the proposed models on a bivariate reanalysis climate model output dataset previously analyzed using purely spatial models. Furthermore, we carry out a cross-validation study to examine the advantages of using spatio-temporal models over purely spatial models. Finally, we outline future research directions and open problems.

Recommended citation: Salvaña, M. L. O., & Genton, M. G. (2020). "Nonstationary cross-covariance functions for multivariate spatio-temporal random fields." Spatial Statistics. 37, 100411. http://marysalvana.github.io/files/spatial-statistics.pdf