GeoPython2019

Bayesian modeling with spatial data using PyMC3
2019-06-25, 15:00–15:30, Room 1

This talk will be a dive into the field of spatial statistical modeling using Bayesian models. We'll learn how to define the Bayesian model, how to sample from a posterior distribution and then evaluate our results using an ecological application.


In this talk we will be learning how to define a Bayesian model for spatial data for a simple ecological application using PyMC3. We'll also be going over some diagnostics to check our model.

Markov chain Monte Carlo (MCMC) methods are used to sample from various complex probability distributions. In this talk, we'll primarily go over two MCMC techniques - Gibbs sampler and a random-walk Metropolis-Hastings sampler.

Bayesian models split a complicated model into three basic components. The spatial data model occupies one level of the hierarchy, while the process model resides below it. Typically, a third hierarchical level contains statistical models, also called priors, for unknown parameters that include additional physical information.

All the statistical jargon aside, all we're doing is simply building a model by assuming certain priors and then making some more assumptions to explain the spatial data we see - it could be population, probability of a disease or census data. MCMC sampling techniques help us to approximate certain posterior distributions. And we'll use PyMC3 library for this. PyMC3 is a highly popular library for probabilistic programming.

By the end of this talk, the audience would have :
1. Learnt how to define a Bayesian model for spatial data in Python
2. Learnt the basics of using two MCMC sampling techniques in PyMC3 - gibbs and Metropolis Hastings
3. Learnt how to conduct a proper diagnosis of the model using metrics like autocorrelation plots, standard error and histogram plots

Audience level:
Python : Beginner
Computational skills: Intermediate

Outline
1. Introduction (10 mins) * Bayesian models - priors, conjugate posteriors (5 mins) * MCMC sampling techniques in PyMC3 (5 mins)

  1. Building the model (10 mins) * Defining the model for our ecological application (5 mins) * Model hyperparameters - initial values and priors (5 mins)

  2. Results and Diagnostics (10 mins) * Diagnostic check of model using metrics mentioned above (5 mins) * Comparing the probability distribution sampled with the true distribution (5 mins)