Atzera

Aritz Pérez

Postdoc Fellow

Argazkirik ez

T +34 946 567 842
F +34 946 567 842
E aperez@bcamath.org

Information of interest

Postdoc Fellow at BCAM. The main methodological research lines include probabilistic graphical models, supervised classification, information theory, density estimation and feature subset selection. The methodological contributions have been applied to the fields of bioinformatics (genetics and epigenetics) and ecological modelling (fisheries).

  • Efficient Learning of Minimax Risk Classifiers in High Dimensions 

    Bondugula, K.R.Autoridad BCAM; Mazuelas, S.Autoridad BCAM; Pérez, A.Autoridad BCAM (2023-08-01)
    High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient ...
  • Fast K-Medoids With the l_1-Norm 

    Capó, M.; Pérez, A.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2023-07-26)
    K-medoids clustering is one of the most popular techniques in exploratory data analysis. The most commonly used algorithms to deal with this problem are quadratic on the number of instances, n, and usually the quality of ...
  • Implementing the Cumulative Difference Plot in the IOHanalyzer 

    Arza, E.Autoridad BCAM; Ceberio, J.; Irurozki, E.; Pérez, A.Autoridad BCAM (2022-07)
    The IOHanalyzer is a web-based framework that enables an easy visualization and comparison of the quality of stochastic optimization algorithms. IOHanalyzer offers several graphical and statistical tools analyze the results ...
  • Generalized Maximum Entropy for Supervised Classification 

    Mazuelas, S.Autoridad BCAM; Shen, Y.; Pérez, A.Autoridad BCAM (2022-04)
    The maximum entropy principle advocates to evaluate events’ probabilities using a distribution that maximizes entropy among those that satisfy certain expectations’ constraints. Such principle can be generalized for ...
  • Rank aggregation for non-stationary data streams 

    Irurozki, E.; Pérez, A.Autoridad BCAM; Lobo, J.L.; Del Ser, J.Autoridad BCAM (2022)
    The problem of learning over non-stationary ranking streams arises naturally, particularly in recommender systems. The rankings represent the preferences of a population, and the non-stationarity means that the distribution ...
  • LASSO for streaming data with adaptative filtering 

    Capó, M.; Pérez, A.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2022)
    Streaming data is ubiquitous in modern machine learning, and so the development of scalable algorithms to analyze this sort of information is a topic of current interest. On the other hand, the problem of l1-penalized ...
  • Machine learning from crowds using candidate set-based labelling 

    Beñaran-Muñoz, I.Autoridad BCAM; Hernandez, J.; Pérez, A.Autoridad BCAM (2022)
    Crowdsourcing is a popular cheap alternative in machine learning for gathering information from a set of annotators. Learning from crowd-labelled data involves dealing with its inherent uncertainty and inconsistencies. In ...
  • Dirichlet process mixture models for non-stationary data streams 

    Casado, I.Autoridad BCAM; Pérez, A.Autoridad BCAM (2022)
    In recent years, we have seen a handful of work on inference algorithms over non-stationary data streams. Given their flexibility, Bayesian non-parametric models are a good candidate for these scenarios. However, reliable ...
  • Non-parametric discretization for probabilistic labeled data 

    Flores, J.L.; Calvo, B.; Pérez, A.Autoridad BCAM (2022)
    Probabilistic label learning is a challenging task that arises from recent real-world problems within the weakly supervised classification framework. In this task algorithms have to deal with datasets where each instance ...
  • A cheap feature selection approach for the K -means algorithm 

    Capo, M.; Pérez, A.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2021-05)
    The increase in the number of features that need to be analyzed in a wide variety of areas, such as genome sequencing, computer vision or sensor networks, represents a challenge for the K-means algorithm. In this regard, ...
  • K-means for Evolving Data Streams 

    Bidaurrazaga, A.Autoridad BCAM; Pérez, A.Autoridad BCAM; Capó, M. (2021-01-01)
    Nowadays, streaming data analysis has become a relevant area of research in machine learning. Most of the data streams available are unlabeled, and thus it is necessary to develop specific clustering techniques that take ...
  • On the fair comparison of optimization algorithms in different machines 

    Arza, E.Autoridad BCAM; Pérez, A.Autoridad BCAM; Ceberio, J.; Irurozki, E. (2021)
    An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to ...
  • A Machine Learning Approach to Predict Healthcare Cost of Breast Cancer Patients 

    Rakshit, P.; Zaballa, O.Autoridad BCAM; Pérez, A.Autoridad BCAM; Gomez-Inhiesto, E.; Acaiturri-Ayesta, M.T.; Lozano, J.A.Autoridad BCAM (2021)
    This paper presents a novel machine learning approach to per- form an early prediction of the healthcare cost of breast cancer patients. The learning phase of our prediction method considers the following two steps: i) in ...
  • Minimax Classification with 0-1 Loss and Performance Guarantees 

    Mazuelas, S.Autoridad BCAM; Zanoni, A.; Pérez, A.Autoridad BCAM (2020-12-01)
    Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss. Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate ...
  • General supervision via probabilistic transformations 

    Mazuelas, S.Autoridad BCAM; Pérez, A.Autoridad BCAM (2020-08-01)
    Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training samples. This ...
  • An efficient K-means clustering algorithm for tall data 

    Capo, M.; Pérez, A.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2020)
    The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. Therefore, the development of efficient and parallel algorithms to perform such an analysis is a a crucial ...
  • An adaptive neuroevolution-based hyperheuristic 

    Arza, E.Autoridad BCAM; Ceberio, J.; Pérez, A.Autoridad BCAM; Irurozki, E. (2020)
    According to the No-Free-Lunch theorem, an algorithm that performs efficiently on any type of problem does not exist. In this sense, algorithms that exploit problem-specific knowledge usually outperform more generic ...
  • On-line Elastic Similarity Measures for time series 

    Oregui, I.; Pérez, A.Autoridad BCAM; Del Ser, J.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2019-04)
    The way similarity is measured among time series is of paramount importance in many data mining and machine learning tasks. For instance, Elastic Similarity Measures are widely used to determine whether two time series are ...
  • Crowd Learning with Candidate Labeling: an EM-based Solution 

    Beñaran-Muñoz, I.Autoridad BCAM; Hernández-González, J.; Pérez, A.Autoridad BCAM (2018-09-27)
    Crowdsourcing is widely used nowadays in machine learning for data labeling. Although in the traditional case annotators are asked to provide a single label for each instance, novel approaches allow annotators, in case ...
  • On-Line Dynamic Time Warping for Streaming Time Series 

    Oregui, I.; Pérez, A.Autoridad BCAM; Del Ser, J.Autoridad BCAM; Lozano, J.A.Autoridad BCAM (2017-09)
    Dynamic Time Warping is a well-known measure of dissimilarity between time series. Due to its flexibility to deal with non-linear distortions along the time axis, this measure has been widely utilized in machine learning ...

Informazio gehiago

FractalTree

Implementation of the procedures presented in A. Pérez, I. Inza and J.A. Lozano (2016). Efficient approximation of probability distributions with k-order decomposable models. International Journal of Approximate Reasoning 74, 58-87.

Authors: Aritz Pérez

License: free and open source software

MixtureDecModels

Learning mixture of decomposable models with hidden variables

Authors: Aritz Pérez

License: free and open source software

Placement

Local

BayesianTree

Approximating probability distributions with mixtures of decomposable models

Authors: Aritz Pérez

License: free and open source software

Placement

Local

KmeansLandscape

Study the k-means problem from a local optimization perspective

Authors: Aritz Pérez

License: free and open source software

Placement

Local

PGM

Procedures for learning probabilistic graphical models

Authors: Aritz Pérez

License: free and open source software

Placement

Local

On-line Elastic Similarity Measures

Adaptation of the most frequantly used elastic similarity measures: Dynamic Time Warping (DTW), Edit Distance (Edit), Edit Distance for Real Sequences (EDR) and Edit Distance with Real Penalty (ERP) to on-line setting.

Authors: Izaskun Oregi, Aritz Perez, Javier Del Ser, Jose A. Lozano

License: free and open source software

MRCpy: a library for Minimax Risk Classifiers 

MRCpy library implements minimax risk classifiers (MRCs) that are based on robust risk minimization and can utilize 0-1-loss.

Authors: Kartheek Reddy, Claudia Guerrero, Aritz Perez, Santiago Mazuelas

License: free and open source software

OPTECOT - Optimal Evaluation Cost Tracking

This repository contains supplementary material for the paper Speeding-up Evolutionary Algorithms to solve Black-Box Optimization Problems. In this work, we have presented OPTECOT (Optimal Evaluation Cost Tracking): a technique to reduce the cost of solving a computationally expensive black-box optimization problem using population-based algorithms, avoiding loss of solution quality. OPTECOT requires a set of approximate objective functions of different costs and accuracies, obtained by modifying a strategic parameter in the definition of the original function. The proposal allows the selection of the lowest cost approximation with the trade-off between cost and accuracy in real time during the algorithm execution. To solve an optimization problem different from those addressed in the paper, the repository also contains a library to apply OPTECOT with the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) optimization algorithm.

Authors: Judith Echevarrieta, Etor Arza, Aritz Pérez

License: free and open source software

TransfHH

A multi-domain methodology to analyze an optimization problem set

Authors: Etor Arza, Ekhiñe Irurozki, Josu Ceberio, Aritz Perez

License: free and open source software