Light PhD Seminar: Advances in Streaming Novelty Detection

Date: Thu, Sep 15 2022

Hour: 12:00

Location: Seminar Room - BCAM

Speakers: Ander Carreño

Location: Seminar Room - BCAM

Advances in Streaming Novelty Detection

Supervised by: Iñaki Inza & Jose A. Lozano

Due to the massive growth of machine learning applications motivated by the outstanding results obtained in a wide variety of areas that range from medicine, biology or economics to engineering and physics, a set of terms have been indistinctly used to refer to different problems. Such terms correspond to rare event, anomaly, novelty and outlier detection. As a first contribution of this PhD dissertation, a taxonomy of terms and learning scenarios is described that tries to give a short step into the standardization of the area. In such work, several key papers of the literature that also recall on the same problem have been analyzed. In order to further proof the proposed assignment, some experiments retrieving papers from Google Scholar, IEEE Xplore and ACM Digital Library have been performed that not only support the existing mix-up between terms and problems, but also the given taxonomy.

As a second contribution, the Streaming Novelty Detection (SND) problem that gives name to this dissertation is treated. SND consists on learning a model that classifies among a given set of classes. At prediction time, unsupervised instances arrive in a stream fashion and the model must provide a classification for them; considering that the underlaying distribution of the data might change -the so called concept drift-. Moreover, once in a while, some of the newcomer instances do not belong to the previously learned set of classes and the model must recognize them. When sufficient amount of such instances are available, the model self-discovers new emerging classes and is updated to consider these new concept for future predictions. To tackle this problem, a self-evolving algorithm based on a mixture of Gaussian distribution is proposed. 

The last contribution of this dissertation also deals with the SND problem but, in this case, the instances are time series. To tackle this problem, deep auto encoders are used that compress the instances into a deep feature space (embedding) and then, Support Vector Data Description networks are used that enclose the instances into hyperspheres of minimum volume. In this work, a solution that allows an expert to evaluate the stream in hindsight is proposed. 



Confirmed speakers:

Ander Carreño