Selected Presentations
(
Upcoming |
Slides and video
)
Upcoming
- 2018 March 5. Women in Data Science (WiDS), Cambridge, MA.
- 2018 March 26--28. BayesComp 2018, Barcelona, Spain.
- 2018 April 5. Plenary lecture, Google StatFoo Conference, CA, USA.
- 2018 April 6. Facebook, CA, USA.
- 2018 April 18. "Uncertainty quantification in complex, nonparametric statistical models", Lorentz Center, Leiden, Netherlands.
- 2018 May 19. International Indian Statistical Association (IISA) Conference , Gainesville, FL, USA.
- 2018 May 31--June 1. "Statistics when the model is wrong", Radcliffe Institute for Advanced Study, Harvard University, USA.
- 2018 June 4--6. Centre for Computational and Statistical Machine Learning (CSML) Master Class, UCL, London, UK.
- 2018 June 18. R-Ladies Buenos Aires and Buenos Aires Women in Machine Learning & Data Science (WiMLDS), Buenos Aires, Argentina.
- 2018 June 19--20. Machine Learning Summer School (MLSS), Buenos Aires.
- 2018 June 24--29. ISBA World Meeting, Edinburgh, UK.
- 2018 July 2. BNPSI 2018: Workshop on Bayesian nonparametrics for signal and image processing, Bordeaux, France.
- 2018 July 5. IMS Annual Meeting on Probability and Statistics, Vilnius, Lithuania.
- 2018 July 10. Tutorial at ICML, Stockholm, Sweden.
- 2018 July 28--August 2. Joint Statistical Meetings (JSM), Vancouver, Canada.
Research presentations
- Coresets for automated, scalable Bayesian inference.
[abstract]
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical relationships, uncertainty quantification, and prior specification these methods provide. Standard Bayesian inference algorithms are often computationally expensive, however, so their direct application to large datasets can be difficult or infeasible. Other standard algorithms sacrifice accuracy in the pursuit of scalability. We take a new approach. Namely, we leverage the insight that data often exhibits approximate redundancies to instead obtain a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. We can then use this small coreset in existing Bayesian inference algorithms without modification. We provide theoretical guarantees on the size and approximation quality of the coreset. In particular, we show that our method provides geometric decay in posterior approximation error as a function of coreset size. We validate on both synthetic and real datasets, demonstrating that our method reduces posterior approximation error by orders of magnitude relative to uniform random subsampling.
- Coresets for scalable Bayesian logistic regression.
[abstract]
The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification. In this paper, we develop an efficient coreset construction algorithm for Bayesian logistic regression models. We provide theoretical guarantees on the size and approximation quality of the coreset---both for fixed, known datasets, and in expectation for a wide class of data generative models. Crucially, the proposed approach also permits efficient construction of the coreset in both streaming and parallel settings, with minimal additional effort. We demonstrate the efficacy of our approach on a number of synthetic and real-world datasets, and find that, in practice, the size of the coreset is independent of the original dataset size. Furthermore, constructing the coreset takes a negligible amount of time compared to that required to run MCMC on it.
- Fast Quantification of Uncertainty and Robustness with Variational Bayes.
[abstract]
In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice. We demonstrate that variational Bayes (VB) techniques are readily amenable to robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).
- [slides pdf]
- [video: 2017 August 23]. Microsoft Research New England.
- [video: 2016 October 17]. Biostatistics-Biomedical Informatics Big Data (B3D) Seminar, Harvard University, USA.
- [video: 2016 August 23]. 2016 Big Data Conference & Workshop, Harvard, Cambridge, Massachusetts, USA.
- Posteriors, conjugacy, and exponential families for completely random measures.
[abstract]
We demonstrate how to calculate posteriors for general Bayesian nonparametric priors and likelihoods based on completely random measures (CRMs).We further show how to represent Bayesian nonparametric priors as a sequence of finite draws using a size-biasing approach---and how to represent full Bayesian nonparametric models via finite marginals. Motivated by conjugate priors based on exponential family representations of likelihoods, we introduce a notion of exponential families for CRMs, which we call exponential CRMs. This construction allows us to specify automatic Bayesian nonparametric conjugate priors for exponential CRM likelihoods. Wedemonstrate that our exponential CRMs allow particularly straightforward recipes for size-biased and marginal representations of Bayesian nonparametric models. Along the way, we prove that the gamma process is a conjugate prior for the Poisson likelihood process and the beta prime process is a conjugate prior for a process we call the odds Bernoulli process. We deliver a size-biased representation of the gamma process and a marginal representation of the gamma process coupled with a Poisson likelihood process.
- Feature allocations, probability functions, and paintboxes.
[abstract]
The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an "exchangeable feature probability function" (EFPF)---analogous to the EPPF in the clustering setting---for certain types of feature models. Moreover, we introduce a "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations.
- [slides pdf]
- [video: 2015 February 25]. Harvard Applied Statistics Workshop, Harvard, Cambridge, Massachusetts, USA.
- [video: 2014 October 15]. Redwood Center for Theoretical Neuroscience, UC Berkeley, Berkeley, California, USA.
- Streaming variational Bayes.
[abstract]
We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.
- Clusters and features from combinatorial stochastic processes. [abstract]
In partitioning---a.k.a. clustering---data, we associate each data point with one and only one of some collection of groups called clusters or partition blocks. Here, we formally establish an analogous problem, called feature allocation, for associating data points with arbitrary non-negative integer numbers of groups, now called features or topics. Just as the exchangeable partition probability function (EPPF) can be used to describe the distribution of cluster membership under an exchangeable clustering model, we examine an analogous "exchangeable feature probability function" (EFPF) for certain types of feature models. Moreover, recalling Kingman's paintbox theorem as a characterization of the class of exchangeable clustering models, we develop a similar "feature paintbox" characterization of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations. We examine models such as the Bayesian nonparametric Indian buffet process as examples within these broader classes.
- [video: 2012 September 20]. Bayesian Nonparametrics, ICERM Semester Program on Computational Challenges in Probability, Brown University, Providence, Rhode Island, USA.
- [video: 2012 September 20]. Bayesian Nonparametrics, ICERM Semester Program on Computational Challenges in Probability, Brown University, Providence, Rhode Island, USA.
- MAD-Bayes: MAP-based asymptotic derivations from Bayes
[abstract]
The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.
- [slides pdf]
- [poster pdf]
- [video: 2013 June 17]. 30th International Conference on Machine Learning (ICML 2013), Atlanta, Georgia, USA.
- User-friendly conjugacy for completely random measures
- [slides pdf] Joint Statistical Meetings (JSM) 2013, Montreal, Canada.
- Fast and flexible selection with a single switch.
- [video: 2009 December 10]. Mini-Symposia on Assistive Machine Learning for People with Disabilities, Neural Information Processing Systems (NIPS) 2009, Vancouver, British Columbia, Canada.
- [video] Nomon keyboard tutorial.
- [video] Example sentence written using Nomon.
- [video: 2009 December 10]. Mini-Symposia on Assistive Machine Learning for People with Disabilities, Neural Information Processing Systems (NIPS) 2009, Vancouver, British Columbia, Canada.