Research area:

Atrás
002

CAS

Computational and Applied Statistics

The research line in Computational and Applied Statistics develops scalable methods for Bayesian inference, probabilistic modelling, and uncertainty quantification, with a strong emphasis on efficient Monte Carlo computation and sequential data assimilation.

The aim of the research in Computational and Applied Statistics is to consolidate BCAM as a reference in areas such as computational statistics, Bayesian inference, time-series forecasting, and other related methodological areas of statistical learning, data science, and probabilistic machine learning. We also target some applications in climate modelling, biostatistics, demography, environmental modeling, biomedical statistics, epidemiology, and business analytics, among many other areas. Our goal is to enable reliable statistical learning and forecasting from complex, high-dimensional, and heterogeneous data.

AP Overview

The Computational and Applied Statistics research line at BCAM advances the foundations of statistical computation for modern data analysis. We develop methods for Bayesian inference, probabilistic modelling, and uncertainty quantification, with a strong emphasis on Monte Carlo algorithms, sequential Bayesian computation, and particle methods for dynamical systems. Our research covers state-space modelling, adaptive sampling, stochastic simulation, and forecasting under model uncertainty, addressing challenges in high dimensions, nonlinearity, and data sparsity. We aim for methods that are both theoretically sound and computationally scalable, enabling reliable decision-making from incomplete or multimodal information.

The group builds statistical methodology that is broadly applicable rather than tied to a single field. Application areas include climate and environmental systems, biomedical and epidemiological data, and complex engineering processes, where dynamical behaviour and uncertainty play a central role. We combine modelling and computation to integrate heterogeneous data, improve interpretability, and deliver robust predictions with quantified uncertainty.

We foster collaboration with leading international institutions in statistics, applied mathematics, data science, and domain sciences. The line contributes to training in computational statistics through open-source software, reproducible workflows, and advanced courses. Our objective is to consolidate BCAM as a reference in computational statistics and probabilistic machine learning, driving innovation at the interface between methodology, computation, and impactful applications.

 

CAS

Statistical Modelling for Recurrent Events in Sports Injury Research with Applications to Football InjuryData.

Name: Zumeta, Lore
Thesis advisor(s): Lee, Dae-Jin (BCAM)
University: University of the Basque Country (UPV/EHU)
CAS

A general framework for prediction in generalized additive models

Name: Carballo González, Alba
Thesis advisor(s): Lee, Dae-Jin y Durbán, María Luz
University: Universidad Carlos III de Madrid (UC3M)
CAS

Hierarchical modelling of patient-reported outcomes data based on the beta-binomial distribution

Name: Najera, Josu
Thesis advisor(s): Lee, Dae-Jin and Arostegui, Inma
University: Universidad del País Vasco (UPV/EHU)

npROCRegression: nukleoan oinarritutako ROC erregresio modelatze ez parametrikoa

Zenbait erregresio planteamendu ez parametriko aplikatzen ditu diagnostiko errendimenduaren (ROC) esparruan aldagaikideak txertatzeko.

Deskargatu hemen:

https://CRAN.R-project.org/package=npROCRegression

npROCRegression: Modelización de regresión ROC no paramétrica basada en kernel

Implementa varios enfoques de regresión no paramétrica para incluir información sobre covariables en el marco de características operativas del receptor (ROC).

Se puede descargar desde:

https://CRAN.R-project.org/package=npROCRegression

npROCRegression: Kernel-Based Nonparametric ROC Regression Modelling

Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.

Download from:

https://CRAN.R-project.org/package=npROCRegression

PROreg: Patient Reported Outcomes Regression Analysis

Offers a variety of tools, such as specific plots and regression model approaches, for analyzing different patient reported questionnaires. Especially, mixed-effects models based on the beta-binomial distribution are implemented to deal with binomial data with over-dispersion (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2017).

Download from:

https://cran.r-project.org/package=PROreg

PROreg: Análisis de regresión de los resultados comunicados por el paciente

Ofrece varias herramientas, como los gráficos específicos así como enfoques de modelos de regresión, para analizar diferentes cuestionarios comunicados por los pacientes. Se implementan especialmente los modelos de efecto mixto basados en la distribución beta-binomial para tratar datos binomiales con sobredispersión (véase Najera-Zuloaga J., Lee D.-J. y Arostegui I. (2017)).

Se puede descargar desde:

https://cran.r-project.org/package=PROreg

PROreg: pazienteek adierazitako emaitzen erregresioaren analisia

Hainbat tresna eskaintzen ditu, hala nola grafiko espezifiko eta erregresio ereduen planteamenduak, pazienteek erantzundako galdetegi desberdinak aztertzeko. Zehazki, banaketa beta-binomialean oinarritutako efektu mistoko ereduak aplikatzen dira gehiegizko dispertsioa duten datu binomialak lantzeko (ikus Najera-Zuloaga J., Lee D.-J. eta Arostegui I. (2017).

Deskargatu hemen:

https://cran.r-project.org/package=PROreg

SpATS: Spatial Analysis of Field Trials with Splines

Allows for the use of two-dimensional (2D) penalised splines (P-splines) in the context of agricultural field trials. Traditionally, the modelling of the spatial or environmental effect in the expression of phenotypes has been done assuming correlated random noise (Gilmour et al, 1997). We, however, propose to model the spatial variation explicitly using 2D P-splines (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Besides the existence of fast and stable algorithms for estimation (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), the direct and nice interpretation of the spatial trend that this approach provides makes it attractive for the analysis of field experiments.

Download from:

https://CRAN.R-project.org/package=SpATS

SpATS: Spatial Analysis of Field Trials with Splines

Permite la utilización de splines (p-splines) bidimensionales (2D) penalizados en el contexto de ensayos de campo agrícolas. Tradicionalmente, la modelización del efecto espacial o ambiental en la expresión de los fenotipos se ha realizado asumiendo un ruido aleatorio correlacionado (Gilmour et al, 1997). Sin embargo, nosotros proponemos modelizar la variación espacial explícitamente utilizando P-splines bidimensionales (Rodríguez-Alvarez et al., 2016; arXiv:1607.08255). Además de la existencia de algoritmos rápidos y estables para su estimación (Rodríguez-Alvarez et al., 2015; Lee et al., 2013), la interpretación directa y agradable de la tendencia espacial que proporciona este planteamiento hace que sea atractivo para el análisis de los experimentos de campo.

Se puede descargar desde:

https://CRAN.R-project.org/package=SpATS

SpATS: landa proben analisi espaziala spline-ekin

Bi dimentsioko (2D) spline penalizatuak (P-spline) erabiltzeko aukera ematen du nekazaritzako landa proben testuinguruan. Tradizionalki, fenotipoen adierazpenean efektu espazialak edo ingurumen efektuak modelatzeko, elkarri lotutako ausazko zaratak onartu izan dira (Gilmour et al, 1997). Hala ere, guk proposatzen duguna da aldakuntza espaziala modu esplizituan modelatzea 2D P-spline-ak baliatuta (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Estimazioa egiteko algoritmo azkar eta egonkorrak izateaz gain (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), planteamendu honek eskaintzen duen joera espazialaren zuzeneko interpretazio baliagarriari esker, landako probak aztertzeko planteamendu erakargarria dela esan daiteke.

Deskargatu hemen:

https://CRAN.R-project.org/package=SpATS

OpenTraffic

OpenTraffic is an open source platform for Traffic Incidences Data Analytics in Euskadi.

Authors: Gorka Kobeaga, Dae-Jin Lee

License: General Public License

Download from

BCAM Redmine and GitHub

https://github.com/gkobeaga/opentraffic

HRQoL

HRQoL is an R package containing regression models with Beta-Binomial distribution for Health Related Quality of Life data

Authors: Josu Nájera, Dae-Jin Lee

License: General Public License

Download from

BCAM Redmine and GitHub

https://github.com/josunajera/HRQoL

SAP

R package for fast estimation of multidimensional models with anisotropic penalties

Authors: María Xosé Rodriguez, Dae-Jin Lee, Thomas Kneib, María Durbán, Paul Eilers

License: General Public License

Download from

SOP

Pre-release version of a more general which includes SAP algorithm and implements adaptive smoothing in one a more dimensions

Authors: María Xosé Rodriguez, Manuel Oviedo, Dae-Jin Lee

License: General Public License

Placement

Personal computer

statgenHTP

High Throughput Phenotyping (HTP) Data Analysis

Authors: Emilie J Millet, Maria Xose Rodriguez Alvarez, Diana Marcela Perez Valencia, Isabelle Sanchez, Nadine Hilgert, Bart-Jan van Rossum, Fred van Eeuwijk, Martin Boer

License: Open source

spHDM

Supporting code for: "A two-stage approach for the spatio-temporal modelling of high throughput phenotyping data" (Scientific Reports)

Authors: Diana Marcela Pérez Valencia, María Xosé Rodríguez Álvarez, Martin Boer, Lukas Kronenberg, Andreas Hund, Llorence Cabrera Bosquet, Emillie Millet, Fred van Eeuwijk

License: Open source

spatio-temporal spHDM

Supporting code for: "A one-stage approach for the spatio-temporal modelling of high throughput phenotyping data" (BiorXiv, under review JABES)

Authors: Diana Marcela Pérez Valencia, María Xosé Rodríguez Álvarez, Martin Boer, Fred van Eeuwijk

License: Open source

TimeToEvent-InjurySim

The accompanying code repository for the scientific paper: "Zumeta-Olaskoaga, L., Weigert, M., Larruskain, J., Bikandi, E., Setuain, I., Lekue, J., … Lee, D.-J. (2021). Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models. AStA Advances in Statistical Analysis, 1–26. doi: 10.1007/s10182-021-00428-2"

Authors: Lore Zumeta-Olaskoaga (software developer), Maximilian Weigert (software developer) 
Jon Larruskain, Eder Bikandi, Igor Setuain, Josean Lekue, Helmut Küchenhoff, Dae-Jin Lee (co-authors)

License: MIT

flex-mod-training-loads-recu-injuries

The accompanying code repository for the research paper: "Zumeta-Olaskoaga, L., Bender, A. and Lee, D.-J. Flexible modelling of time-varying exposures and recurrent events to analyze training loads effects in team sports injuries".

Authors: Lore Zumeta-Olaskoaga (software developer), Andreas Bender and Dae-Jin Lee (co-authors)

License: MIT

injurytools

Injury tools R package: "A Toolkit for Sports Injury Data Analysis"

Authors: Lore Zumeta-Olaskoaga (author, mantainer)

License: MIT