Scalable Bayesian inference for the generalized linear mixed model

Abstract

The generalized linear mixed model (GLMM) is a popular statistical approachfor handling correlated data, and is used extensively in applications areaswhere big data is common, including biomedical data settings. The focus of thispaper is scalable statistical inference for the GLMM, where we definestatistical inference as: (i) estimation of population parameters, and (ii)evaluation of scientific hypotheses in the presence of uncertainty. Artificialintelligence (AI) learning algorithms excel at scalable statistical estimation,but rarely include uncertainty quantification. In contrast, Bayesian inferenceprovides full statistical inference, since uncertainty quantification resultsautomatically from the posterior distribution. Unfortunately, Bayesianinference algorithms, including Markov Chain Monte Carlo (MCMC), becomecomputationally intractable in big data settings. In this paper, we introduce astatistical inference algorithm at the intersection of AI and Bayesianinference, that leverages the scalability of modern AI algorithms withguaranteed uncertainty quantification that accompanies Bayesian inference. Ouralgorithm is an extension of stochastic gradient MCMC with novel contributionsthat address the treatment of correlated data (i.e., intractable marginallikelihood) and proper posterior variance estimation. Through theoretical andempirical results we establish our algorithm's statistical inferenceproperties, and apply the method in a large electronic health records database.

Quick Read (beta)

loading the full paper ...