Applied Statistics

The aim of the research line in Applied Statistics is to create innovative statistical models, inference methods, computational algorithms and visualization tools for analyzing complex data sets from different and diverse sources.

The aim of the research in Applied Statistics is to consolidate BCAM as a reference in areas such as biostatistics, demography, environmental modeling, medical statistics, epidemiology, business analytics, and biomedical research applications involving data-driven mathematical and statistical tools. We aim to capture opportunities and challenges empowering collaboration with other research areas and groups (other BERC centers, business collaborators, Public Health institutions, government organizations, and Universities) in accessing, managing, integrating, analyzing and modeling datasets of diverse nature and complexity.

AP Overview

The Applied Statistics Research line at BCAM will contribute to create synergies between researchers from national and international institutions from different fields that require the use of statistical techniques for data modeling.

Our research is related to semi-parametric regression, multidimensional smoothing, (Bayesian) hierarchical models, random-effects models, longitudinal data, spatial and spatio-temporal modeling, functional data analysis, computational statistics, and data visualization tools and methods.

In particular, in the biomedical area, "Biostatistics" uses data to measure, understand and ultimately solve medical problems, by the use of statistical models and theory. Biostatistics is an exciting and versatile discipline contributing to all fields of medical research, evidence-based health care and decision-making. The increasing need of biostatistical support for the Basque Public Health Institutions, demands researchers in Biostatistics that not only support other researchers in biomedical and related sciences through statistical analyses and scientific support, but specially to contribute to high-impact research, excellence, innovation and training in statistical modeling.

The research line contributes with the Spanish National Network of Biostatistics (BIOSTATNET), a pioneer network led by applied statisticians from different institutions with own research projects and teaching experience in Biostatistics, working closely with biomedical researchers. We also actively collaborate with the Biostatistics group at University of the Basque Country (UPV/EHU) and other national and international institutions in order to address issues of mathematical and statistical theory and methodology to improve decision-making process. We aim to highlight and increase the role of Statistics and foster collaboration with our partners and promote professional development and training in the area of Applied Statistics.

The statistical modeling methodology developed by the group deals with those aspects of the analysis of data that are not highly specific to particular fields of study. Therefore, our research provides concepts and methods that will, with suitable modification, be applicable in many fields (e.g. Economics, Business, Engineering, Demography etc.) which demand a wide variety of data modeling and computational tools for the analysis of complex problems, particularly where a huge amount of data is collected.




A general framework for prediction in generalized additive models

Name: Carballo González, Alba
Thesis advisor(s): D.J. Lee and M. Durbán
University: Universidad Carlos III de Madrid (UC3M)

Hierarchical modelling of patient-reported outcomes data based on the beta-binomial distribution

Name: Najera, Josu
Thesis advisor(s): Dae-Jin Lee and Inma Arostegui
University: University of the Basque Country (UPV/EHU)

npROCRegression: Kernel-Based Nonparametric ROC Regression Modelling

Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.

Download from:


PROreg: Patient Reported Outcomes Regression Analysis

Offers a variety of tools, such as specific plots and regression model approaches, for analyzing different patient reported questionnaires. Especially, mixed-effects models based on the beta-binomial distribution are implemented to deal with binomial data with over-dispersion (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2017).

Download from:


SpATS: Spatial Analysis of Field Trials with Splines

Allows for the use of two-dimensional (2D) penalised splines (P-splines) in the context of agricultural field trials. Traditionally, the modelling of the spatial or environmental effect in the expression of phenotypes has been done assuming correlated random noise (Gilmour et al, 1997). We, however, propose to model the spatial variation explicitly using 2D P-splines (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Besides the existence of fast and stable algorithms for estimation (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), the direct and nice interpretation of the spatial trend that this approach provides makes it attractive for the analysis of field experiments.

Download from: