CAS
Computational and Applied Statistics
The research line in Computational and Applied Statistics develops scalable methods for Bayesian inference, probabilistic modelling, and uncertainty quantification, with a strong emphasis on efficient Monte Carlo computation and sequential data assimilation.
The aim of the research in Computational and Applied Statistics is to consolidate BCAM as a reference in areas such as computational statistics, Bayesian inference, time-series forecasting, and other related methodological areas of statistical learning, data science, and probabilistic machine learning. We also target some applications in climate modelling, biostatistics, demography, environmental modeling, biomedical statistics, epidemiology, and business analytics, among many other areas. Our goal is to enable reliable statistical learning and forecasting from complex, high-dimensional, and heterogeneous data.
The Computational and Applied Statistics research line at BCAM advances the foundations of statistical computation for modern data analysis. We develop methods for Bayesian inference, probabilistic modelling, and uncertainty quantification, with a strong emphasis on Monte Carlo algorithms, sequential Bayesian computation, and particle methods for dynamical systems. Our research covers state-space modelling, adaptive sampling, stochastic simulation, and forecasting under model uncertainty, addressing challenges in high dimensions, nonlinearity, and data sparsity. We aim for methods that are both theoretically sound and computationally scalable, enabling reliable decision-making from incomplete or multimodal information.
The group builds statistical methodology that is broadly applicable rather than tied to a single field. Application areas include climate and environmental systems, biomedical and epidemiological data, and complex engineering processes, where dynamical behaviour and uncertainty play a central role. We combine modelling and computation to integrate heterogeneous data, improve interpretability, and deliver robust predictions with quantified uncertainty.
We foster collaboration with leading international institutions in statistics, applied mathematics, data science, and domain sciences. The line contributes to training in computational statistics through open-source software, reproducible workflows, and advanced courses. Our objective is to consolidate BCAM as a reference in computational statistics and probabilistic machine learning, driving innovation at the interface between methodology, computation, and impactful applications.
Statistical Modelling for Recurrent Events in Sports Injury Research with Applications to Football InjuryData.
A general framework for prediction in generalized additive models
Hierarchical modelling of patient-reported outcomes data based on the beta-binomial distribution
npROCRegression: nukleoan oinarritutako ROC erregresio modelatze ez parametrikoa
Zenbait erregresio planteamendu ez parametriko aplikatzen ditu diagnostiko errendimenduaren (ROC) esparruan aldagaikideak txertatzeko.
Deskargatu hemen:
npROCRegression: Modelización de regresión ROC no paramétrica basada en kernel
Implementa varios enfoques de regresión no paramétrica para incluir información sobre covariables en el marco de características operativas del receptor (ROC).
Se puede descargar desde:
npROCRegression: Kernel-Based Nonparametric ROC Regression Modelling
Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.
Download from:
PROreg: Patient Reported Outcomes Regression Analysis
Offers a variety of tools, such as specific plots and regression model approaches, for analyzing different patient reported questionnaires. Especially, mixed-effects models based on the beta-binomial distribution are implemented to deal with binomial data with over-dispersion (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2017).
Download from:
PROreg: Análisis de regresión de los resultados comunicados por el paciente
Ofrece varias herramientas, como los gráficos específicos así como enfoques de modelos de regresión, para analizar diferentes cuestionarios comunicados por los pacientes. Se implementan especialmente los modelos de efecto mixto basados en la distribución beta-binomial para tratar datos binomiales con sobredispersión (véase Najera-Zuloaga J., Lee D.-J. y Arostegui I. (2017)).
Se puede descargar desde:
PROreg: pazienteek adierazitako emaitzen erregresioaren analisia
Hainbat tresna eskaintzen ditu, hala nola grafiko espezifiko eta erregresio ereduen planteamenduak, pazienteek erantzundako galdetegi desberdinak aztertzeko. Zehazki, banaketa beta-binomialean oinarritutako efektu mistoko ereduak aplikatzen dira gehiegizko dispertsioa duten datu binomialak lantzeko (ikus Najera-Zuloaga J., Lee D.-J. eta Arostegui I. (2017).
Deskargatu hemen:
SpATS: Spatial Analysis of Field Trials with Splines
Allows for the use of two-dimensional (2D) penalised splines (P-splines) in the context of agricultural field trials. Traditionally, the modelling of the spatial or environmental effect in the expression of phenotypes has been done assuming correlated random noise (Gilmour et al, 1997). We, however, propose to model the spatial variation explicitly using 2D P-splines (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Besides the existence of fast and stable algorithms for estimation (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), the direct and nice interpretation of the spatial trend that this approach provides makes it attractive for the analysis of field experiments.
Download from:
SpATS: Spatial Analysis of Field Trials with Splines
Permite la utilización de splines (p-splines) bidimensionales (2D) penalizados en el contexto de ensayos de campo agrícolas. Tradicionalmente, la modelización del efecto espacial o ambiental en la expresión de los fenotipos se ha realizado asumiendo un ruido aleatorio correlacionado (Gilmour et al, 1997). Sin embargo, nosotros proponemos modelizar la variación espacial explícitamente utilizando P-splines bidimensionales (Rodríguez-Alvarez et al., 2016; arXiv:1607.08255). Además de la existencia de algoritmos rápidos y estables para su estimación (Rodríguez-Alvarez et al., 2015; Lee et al., 2013), la interpretación directa y agradable de la tendencia espacial que proporciona este planteamiento hace que sea atractivo para el análisis de los experimentos de campo.
Se puede descargar desde:
SpATS: landa proben analisi espaziala spline-ekin
Bi dimentsioko (2D) spline penalizatuak (P-spline) erabiltzeko aukera ematen du nekazaritzako landa proben testuinguruan. Tradizionalki, fenotipoen adierazpenean efektu espazialak edo ingurumen efektuak modelatzeko, elkarri lotutako ausazko zaratak onartu izan dira (Gilmour et al, 1997). Hala ere, guk proposatzen duguna da aldakuntza espaziala modu esplizituan modelatzea 2D P-spline-ak baliatuta (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Estimazioa egiteko algoritmo azkar eta egonkorrak izateaz gain (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), planteamendu honek eskaintzen duen joera espazialaren zuzeneko interpretazio baliagarriari esker, landako probak aztertzeko planteamendu erakargarria dela esan daiteke.
Deskargatu hemen:
OpenTraffic
OpenTraffic is an open source platform for Traffic Incidences Data Analytics in Euskadi.
Authors: Gorka Kobeaga, Dae-Jin Lee
License: General Public License
HRQoL
HRQoL is an R package containing regression models with Beta-Binomial distribution for Health Related Quality of Life data
Authors: Josu Nájera, Dae-Jin Lee
License: General Public License
SAP
R package for fast estimation of multidimensional models with anisotropic penalties
Authors: María Xosé Rodriguez, Dae-Jin Lee, Thomas Kneib, María Durbán, Paul Eilers
License: General Public License
SOP
Pre-release version of a more general which includes SAP algorithm and implements adaptive smoothing in one a more dimensions
Authors: María Xosé Rodriguez, Manuel Oviedo, Dae-Jin Lee
License: General Public License
Placement
Personal computer
statgenHTP
High Throughput Phenotyping (HTP) Data Analysis
Authors: Emilie J Millet, Maria Xose Rodriguez Alvarez, Diana Marcela Perez Valencia, Isabelle Sanchez, Nadine Hilgert, Bart-Jan van Rossum, Fred van Eeuwijk, Martin Boer
License: Open source
spHDM
Supporting code for: "A two-stage approach for the spatio-temporal modelling of high throughput phenotyping data" (Scientific Reports)
Authors: Diana Marcela Pérez Valencia, María Xosé Rodríguez Álvarez, Martin Boer, Lukas Kronenberg, Andreas Hund, Llorence Cabrera Bosquet, Emillie Millet, Fred van Eeuwijk
License: Open source
spatio-temporal spHDM
Supporting code for: "A one-stage approach for the spatio-temporal modelling of high throughput phenotyping data" (BiorXiv, under review JABES)
Authors: Diana Marcela Pérez Valencia, María Xosé Rodríguez Álvarez, Martin Boer, Fred van Eeuwijk
License: Open source
TimeToEvent-InjurySim
The accompanying code repository for the scientific paper: "Zumeta-Olaskoaga, L., Weigert, M., Larruskain, J., Bikandi, E., Setuain, I., Lekue, J., … Lee, D.-J. (2021). Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models. AStA Advances in Statistical Analysis, 1–26. doi: 10.1007/s10182-021-00428-2"
Authors: Lore Zumeta-Olaskoaga (software developer), Maximilian Weigert (software developer)
Jon Larruskain, Eder Bikandi, Igor Setuain, Josean Lekue, Helmut Küchenhoff, Dae-Jin Lee (co-authors)
License: MIT
flex-mod-training-loads-recu-injuries
The accompanying code repository for the research paper: "Zumeta-Olaskoaga, L., Bender, A. and Lee, D.-J. Flexible modelling of time-varying exposures and recurrent events to analyze training loads effects in team sports injuries".
Authors: Lore Zumeta-Olaskoaga (software developer), Andreas Bender and Dae-Jin Lee (co-authors)
License: MIT
injurytools
Injury tools R package: "A Toolkit for Sports Injury Data Analysis"
Authors: Lore Zumeta-Olaskoaga (author, mantainer)
License: MIT