- Coresets for scalable Bayesian logistic regression.
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification. In this paper, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset---both for fixed, known datasets, and in expectation for a wide class of data generative models. Crucially, the proposed approach also permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. Furthermore, constructing the coreset takes a negligible amount of time compared to that required to run MCMC on it.
- Fast Quantification of Uncertainty and Robustness with Variational Bayes.
In Bayesian analysis, the posterior follows from the data and a choice
of a prior and a likelihood. These choices may be somewhat subjective
and reasonably vary over some range. Thus, we wish to measure the
sensitivity of posterior estimates to variation in these choices.
While the field of robust Bayes has been formed to address this
problem, its tools are not commonly used in practice.
We demonstrate that variational
Bayes (VB) techniques are readily amenable to robustness analysis.
Since VB casts posterior inference as an optimization problem, its
methodology is built on the ability to calculate derivatives of
posterior quantities with respect to model parameters. We use this
insight to develop local prior robustness measures for mean-field
variational Bayes (MFVB), a particularly popular form of VB due to its
fast runtime on large data sets. A potential problem with MFVB is that
it has a well-known major failing: it can severely underestimate
uncertainty and provides no information about covariance. We
generalize linear response methods from statistical physics to deliver
accurate uncertainty estimates for MFVB---both for individual
variables and coherently across variables. We call our method linear
response variational Bayes (LRVB).
- Posteriors, conjugacy, and exponential families for completely random measures.
We demonstrate how to calculate posteriors for general
Bayesian nonparametric priors and likelihoods based on completely
random measures (CRMs).We further show how to represent Bayesian
nonparametric priors as a sequence of finite draws using a
size-biasing approach---and how to represent full Bayesian
nonparametric models via finite marginals. Motivated by conjugate
priors based on exponential family representations of likelihoods, we
introduce a notion of exponential families for CRMs, which we call
exponential CRMs. This construction allows us to specify automatic
Bayesian nonparametric conjugate priors for exponential CRM
likelihoods. Wedemonstrate that our exponential CRMs allow
particularly straightforward recipes for size-biased and marginal
representations of Bayesian nonparametric models. Along the way, we
prove that the gamma process is a conjugate prior for the Poisson
likelihood process and the beta prime process is a conjugate prior for
a process we call the odds Bernoulli process. We deliver a size-biased
representation of the gamma process and a marginal representation of
the gamma process coupled with a Poisson likelihood process.
- Feature allocations, probability functions, and paintboxes.
The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an "exchangeable feature probability function" (EFPF)---analogous to the EPPF in the clustering setting---for certain types of feature models. Moreover, we introduce a "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations.
- Streaming variational Bayes.
We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.
- Clusters and features from combinatorial stochastic processes. [abstract]
In partitioning---a.k.a. clustering---data, we associate each data
point with one and only one of some collection of groups called
clusters or partition blocks. Here, we formally establish an analogous
problem, called feature allocation, for associating data points with
arbitrary non-negative integer numbers of groups, now called features
or topics. Just as the exchangeable partition probability function
(EPPF) can be used to describe the distribution of cluster membership
under an exchangeable clustering model, we examine an analogous
"exchangeable feature probability function" (EFPF) for certain types
of feature models. Moreover, recalling Kingman's paintbox theorem as a
characterization of the class of exchangeable clustering models, we
develop a similar "feature paintbox" characterization of the class of
exchangeable feature models. We use this feature paintbox construction
to provide a further characterization of the subclass of feature
allocations that have EFPF representations. We examine models such as
the Bayesian nonparametric Indian buffet process as examples within
these broader classes.
- [video: 2012 September 20]. Bayesian Nonparametrics, ICERM Semester Program on Computational Challenges in Probability, Brown University, Providence, Rhode Island, USA.
- MAD-Bayes: MAP-based asymptotic derivations from Bayes
The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.
- User-friendly conjugacy for completely random measures
- [slides pdf] Joint Statistical Meetings (JSM) 2013, Montreal, Canada.
- Fast and flexible selection with a single switch.
- [video: 2009 December 10]. Mini-Symposia on Assistive Machine Learning for People with Disabilities, Neural Information Processing Systems (NIPS) 2009, Vancouver, British Columbia, Canada.
- [video] Nomon keyboard tutorial.
- [video] Example sentence written using Nomon.