Light PhD Seminar: Dirichlet processes in machine learning. From theory to practice

Date: Tue, Nov 22 2022

Hour: 13:00

Location: Maryam Mirzakhani Seminar Room at BCAM

Speakers: Ioar Casado

Location: Maryam Mirzakhani Seminar Room at BCAM

Dirichlet processes in machine learning. From theory to practice

Suppose we are given a dataset X ~ f in a measure space (X,A) and we want to estimate f using X. A common approach is to fix a model class F = {f ∅ } ∅⋲Rn and fit the parameters to data using Bayesian inference. This parametric approach often suffers from model selection and adaptation problems, since f can be very different from any f . For example, suppose that we want to fit a Gaussian mixture model to data. Whenever cross-validation is too expensive, fixing the appropriate number of mixture components can be a daunting task. Furthermore, as more data is gathered, we may need more mixture components to describe it. 

Bayesian non-parametric models tackle these problems by working in an infinitedimensional parameter space, hugely extending the model class under consideration. For these models to be tractable, only a finite number of parameter dimensions is used to describe finite data, and the precise number of dimensions used (hence the complexity of the models considered) depends on the dataset. The flexibility provided by their data-driven model complexity makes Bayesian non-parametrics appealing for many machine learning problems.

In this talk, we introduce the Dirichlet process, one of the best-known Bayesian non-parametric models. After describing its basic properties, we show how it can be used in practice to work with infinite mixture models and solve clustering problems where the number of clusters is unknown. If time permits, we will also present our work adapting Dirichlet process-based clustering to streaming scenarios under concept drift.



Confirmed speakers:

Ioar Casado