Jose Segovia defended his thesis on Monday, May 26th
The defence will take place at Ada Lovelace Room at the Faculty of Informatics of the Donostia UPV/EHU Campus
Jose Ignacio Segovia (Valladolid, 1997), developed a curiosity for logic problems from an early age, which led him to pursue studies in the field of science. He graduated in Mathematics from the University of Valladolid in 2019, spending one academic year abroad through the Erasmus programme in Opole. Afterwards, he completed a master’s degree in mathematical research at the University of Valladolid in 2020. That same year, he began his doctoral studies in the Computer Engineering programme at the University of the Basque Country, based at the Basque Center for Applied Mathematics (BCAM) in the Machine Learning research line (ML). He is currently also working as a substitute lecturer in the Department of Statistics at the University of Valladolid. He has a passion for music—having played the trumpet in various orchestras—and enjoys practising a range of sports, such as running, basketball, and rugby.
His thesis, titled Adapting to Marginal Distribution Shifts in Supervised Learning: A Double-Weighting Approach is under the supervision of Prof. Santiago Mazuelas (BCAM & Ikerbasque).
On behalf of all members of BCAM, we would like to wish Jose all the best for the future, professionally and personally.
Abstract
Supervised classification traditionally assumes that training and testing samples are independently and identically distributed (i.i.d.) from the same underlying distribution. However, practical scenarios are often affected by distribution shifts, such as covariate shift and label shift. In covariate shift, the marginal distribution over the instances (covariates) differs at training and testing while the label conditional distribution remains the same. In label shift, the marginal distribution over the labels differs at training and testing while the instance conditional distribution remains the same. Additionally, in multi-source scenarios, the training data is obtained from multiple sources, each of which has different probability distributions. In scenarios affected by distribution shift, conventional supervised classification methods, like empirical risk minimization, can perform poorly because the empirical risk approximates the training expected risk, rather than the testing expected risk.
Most existing techniques for correcting distribution shifts are based on a reweighted approach that weights training samples, assigning lower relevance to the samples that are unlikely at testing. However, these methods may achieve poor performance under support mismatch or when the weights obtained take large values at certain training samples. In addition, in multi-source cases, existing methods inherit the problems of single-source reweighted methods and do not exploit complementary information among sources, equally combining sources for all instances.
In this dissertation, we establish learning methodologies for supervised learning under marginal distribution shifts. The methodology proposed is based on minimax risk classification and avoids the limitations of existing methods by weighting both training and testing samples. For the multi-source case, the presented methods assign source-dependent weights for training and testing samples, where weights are obtained jointly using information from all sources. In addition, we develop effective techniques that obtain the sets of training and testing weights, generalizing the techniques based on the conventional kernel mean matching. We also present generalization bounds for the proposed methods that show a significant increase in the effective sample size. Empirically, the proposed methods achieve enhanced classification performance in both synthetic and empirical experiments using multiple real datasets.
This dissertation makes theoretical contributions leading to efficient algorithms for multiple supervised learning scenarios under distribution shifts with classification rules that provide confidence in the predictions and enhanced performance in comparison with state-of-the-art techniques.
Related news
Sobre el centro