Discretization, Solution, and Inversion for Large Systems of PDEs
Computer Science Department, University of Colorado Boulder, March 3, 2016
We are often forced to make important decisions with imperfect and incomplete data. In model-based inference, our efforts to extract useful information from data are aided by models of what occurs where we have no observations: examples range from climate prediction to patient-specific medicine. In many cases, these models can take the form of systems of PDEs with critical-yet-unknown parameter fields, such as initial conditions or material coefficients of heterogeneous media. A concrete example that I will present is to make predictions about the Antarctic ice sheet from satellite observations, when we model the ice sheet using a system of nonlinear Stokes equations with a Robin-type boundary condition, governed by a critical, spatially varying coefficient. This talk will present three aspects of the computational stack used to efficiently estimate statistics for this kind of inference problem. At the top is an posterior-distribution approximation for Bayesian inference, that combines Laplace's method with randomized calculations to compute an optimal low-rank representation. Below that, the performance of this approach to inference is highly dependent on the efficient and scalable solution of the underlying model equation, and its first- and second- adjoint equations. A high-level description of a problem (in this case, a nonlinear Stokes boundary value problem) may suggest an approach to designing an optimal solver, but this is just the jumping-off point: differences in geometry, boundary conditions, and other considerations will significantly affect performance. I will discuss how the peculiarities of the ice sheet dynamics problem lead to the development of an anisotropic multigrid method (available as a plugin to the PETSc library for scientific computing) that improves on standard approaches. At the bottom, to increase the accuracy per degree of freedom of discretized PDEs, I develop adaptive mesh refinement (AMR) techniques for large-scale problems. I will present my algorithmic contributions to the p4est library for parallel AMR that enable it to scale to concurrencies of O(10^6), as well as recent work commoditizing AMR techniques in PETSc.


Uniting Performance and Extensibility in Adaptive Finite Element Computations
CAAM Colloqium, Rice University, Houston, Texas, September 14, 2015
This talk will begin with a presentation of recent work that applies p4est—a library for parallel adaptive mesh refinement—to large scale computations, and ice sheet dynamics with highly anisotropic meshs and global mantle convection simulations with billions of degrees of freedom. The former is an example where extension of the datatypes and algorithms in a library can help it to address unanticipated problems; the latter is an example of how extensibility is crucial for the same library that performs well on one process to peform well on a million. Time permitting, this will lead into a discussion of work to expand the reach of adaptive methods in PETSc, including work to make DMPlex—already a flexible interface for unstructured meshes—even for flexible by adding support for the non-conformal meshes that arise in AMR applications.
Multilevel Methods for Forward and Inverse Ice Sheet Modeling
SIAM Conference on Computational Science and Engineering, Salt Lake City, Utah, March 14, 2015
We present our work on recovering ice sheet modeling parameter fields (such as basal slipperiness) from observations, using both deterministic and Bayesian inversion. The scalability of our adjoint- and Hessian-based methods is determined by the scalability of two sub-problems: the solution of the state PDEs (Stokes equations of ice sheet dynamics), and the approximation and preconditioning of the parameter-to-observation Hessian. For the former problem, we compare the effectiveness of geometric and algebraic multigrid within the solution of the state PDEs; for the latter, we discuss the use of multilevel approximations to improve on Hessian approximation by low-rank updates. The scalability of our work is tested on full-scale models of the Antarctic ice sheet.


Statistical Inversion for Basal Parameters for the Antarctic Ice Sheet
3rd SIAM Conference on Uncertainty Quantification, Savannah, Georgia, April 3, 2014
We formulate a Bayesian inference problem for the friction field at the base of the Antarctic ice sheet from distributions for the observed surface velocities and for the prior knowledge of the basal friction. The dimension of the parameter space is large, and the map from parameters to observations requires the solution of a system of implicit nonlinear 3D PDEs. We approximate the posterior distribution with a Gaussian centered at the maximum a posteriori point, with covariance given by the inverse Hessian of the log posterior. By using a low-rank approximation of the log likelihood, we are able to scale up to the problem size of interest.
Hybrid Quadtree/Octree AMR for Anisotropic Domains
16th SIAM Conference on Parallel Processing for Scientific Computing, Portland, Oregon, February 19, 2014
Many problems in geosciences, such as ice sheets, oceans, and the atmosphere, involve thin, anisotropic domains. Researchers in these areas are using 3D models where once they used 2D approximations. The p4est library for parallel adaptive mesh refinement (AMR), which uses a forest-of-octrees approach to AMR, only natively supports isotropic mesh refinement, which is often insufficient for such problems. We present an extension to this library with two refinement modes for these anisotropic problems.

<2014 (Select)

Discretization and Solvers for the Stokes Equations of Ice Sheet Dynamics
12th SIAM Conference on Mathematical and Computational Issues in the Geosciences, Padua, Italy, June 18, 2013
Ice sheets exhibit incompressible creeping flow with shear-thinning rheology. On a continental scale, the flow is characterized by localized regions of fast flow that are separated from vast slow regions by thin transition zones. We use a parallel, adaptive mesh, higher-order finite element discretization to model in 3D the equations for ice sheet dynamics on a continental scale. We will address issues related to the scalable solution of the resulting equations, with emphasis on the nonlinear rheology and addressing the effects of a highly anisotropic discretization.
My thanks to the Bavarian Graduate School of Computational Engineering for inviting me to present a version of this talk to research groups in Erlangen and Munich.
Low-Cost Parallel Algorithms for 2:1 Octree Balance
26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, May 22, 2012
The logical structure of a forest of octrees can be used to create scalable algorithms for parallel adaptive mesh refinement (AMR), which has recently been demonstrated for several petascale applications. Among various frequently used octree- based mesh operations, including refinement, coarsening, partitioning, and enumerating nodes, ensuring a 2:1 size balance between neighboring elements has historically been the most expensive in terms of CPU time and communication volume. The 2:1 balance operation is thus a primary target to optimize. One important component of a parallel balance algorithm is the ability to determine whether any two given octants have a consistent distance/size relation. Based on new logical concepts we propose fast algorithms for making this decision for all types of 2:1 balance conditions in 2D and 3D. Since we are able to achieve this without constructing any parent nodes in the tree that would otherwise need to be sorted and communicated, we can significantly reduce the required memory and communication volume. In addition, we propose a lightweight col- lective algorithm for reversing the asymmetric communication pattern induced by non-local octant interactions. We have implemented our improvements as part of the open- source “p4est” software. Benchmarking this code with both synthetic and simulation-driven adapted meshes we are able to demonstrate much reduced runtime and excellent weak and strong scalability. On our largest benchmark problem with 5.13 x 1011 octants the new 2:1 balance algorithm executes in less than 8 seconds on 112,128 CPU cores of the Jaguar Cray XT5 supercomputer.