Workshop on the occasion of the 10th Anniversary of GRBIOhttps://grbio.upc.edu/en/workshophttps://grbio.upc.edu/++resource++plone-logo.svg
Share:
Workshop on the occasion of the 10th Anniversary of GRBIO
Our research group in Biostatistics and Bioinformatics (GRBIO) is organizing the ‘Workshop on the Occasion of the 10th Anniversary of the GRBIO’, which will take place onJanuary 30 (9:30h-20:00h) and January 31, 2025 (9:00h-14:30h) in the Sala d'Actes of the School of Mathematics and Statistics (FME) of the UPC.
The GRBIO is a consolidated research group funded by the Generalitat de Catalunya and formed by professors and PhD students from the Universitat Politècnica de Catalunya, the Universitat of Barcelona and researchers from external research centers. It was created in 2014 to promote research in Biostatistics and Bioinformatics, both in advanced applications and in the theoretical and computational development of new methodologies. During this time, our group has grown significantly in the field of research and dissemination, consolidating our position as a reference at national and international level.
The opening ceremony, chaired by the Dean of the FME, Prof. Jordi Guàrdia, will be attended by the Vice-Rectors for Research of the UPC, Prof. Jordi Llorca and of the UB, Prof. Jordi García (UB). The closing ceremony, also chaired by Prof. Jordi Guàrdia, will be attended by the directors of the Department of Statistics (UPC), Prof. Xavier Tort, and the Department of Genetics, Microbiology and Statistics (UB), Prof. Bru Cormand.
We are delighted to count with three prominent speakers: Mª Luz Calle, from the University of Vic, Roderic Guigó from the Centre for Genomic Regulation and Geert Molenberghs from the Faculty of Medicine at KU Leuven and at Hasselt University. The workshop will include plenary talks by professors from several Spanish universities with whom the GRBIO has collaborated in recent years, oral presentations by some of our PhD students and a poster session.
The workshop will be open to participation of other researchers, who will have the opportunity to present their research in poster format.
We are pleased to announce that a Certificate of Attendance will be provided to each participant who attends the workshop.
You can expect to receive your certificate in the days following the event.
Here you will find the workshop programme for the two days. Click below each day to see the programme. You can access the biosketch of the speakers by clicking on their name. You can access the abstract of the talk by clicking on the title.
The deadline for abstract submission is December 15, 2024. Send your abstract to ignacio.perez.blasco@upc.edu before the deadline!
We encourage everybody to present your research work in a poster format (size recommendation: A0 [1189 mm Height and 841 mm Width]).
The posters will be displayed between 09:00h and 19:15h. We encourage you to hang your poster between 09:00h and 09:30h (or the earliest possible). We have allocated a special session for posters from 14:45h to 15:45h.
Book of Abstracts
Coming soon.
Keynote speakers
Mª Luz Calle Rosingana
Trends in Biostatistics Over the Last Decade
Trends in Biostatistics Over the Last Decade
Over the past decade, biostatistics has undergone significant advancements, driven by the increasing availability of complex data, the emergence of novel analytical methods, and the growing demand for robust and reproducible results in health sciences. This communication explores key developments in the field and how biostatistics has evolved to meet modern scientific challenges such as, high-dimensionality and data heterogeneity, related to the explosion of big data and the increasing complexity of biomedical research.
Biosketch
Mª Luz Calle is Full Professor of Biostatistics and Bioinformatics and Dean of the Faculty of Sciences, Technology and Engineering, University of Vic – Central University of Catalonia. With a background in Mathematics (BSc Mathematics, Universitat de Barcelona, 1986 and PhD in Mathematics, Universitat Politècnica de Catalunya, 1997), she teaches biostatistics and bioinformatics in the Biotechnology degree and Statistical and data-mining methods for omics data analysis in the Master of Sciences in Omics Data Analysis. She is the group leader of the Bi-squared group (Bionformatics and Bioimaging) of the University of Vic (consolidated group 2021SGR-01249). Her main research areas are statistical genetics, omics data analysis, microbiome data analysis and survival analysis. She works on the development of new methods for biomarker discovery, identification of genetic risk profiles and construction of dynamic prediction and prognostic models of disease evolution. She is also interested in statistical methods for integration of multi-omics data and compositional data approaches in metagenomics.
She is member of several scientific societies: BiostatNet-Spanish National Network in Biostatistics, Catalan Statistical Society, Spanish Society of Statistics and Operational Research, International Biometric Society, International Genetic Epidemiology Society. Former Head of the Biosciences Department (2018-2022), chair of the Master of Sciences in Omics Data Analysis (2012-2020), President and Vicepresident of the Spanish Region of the International Biometrics Society (2012-2013 and 2014, respectively).
Roderic Guigó Serra
Title: TBA
Title: TBA
Abstract: TBA
Biosketch
Professor Roderic Guigó has close to 40 years of experience in research in computational biology. Since 2005 he chairs the Computational Biology of RNA Processing group at the Centre for Genomic Regulation. He is Bioinformatics professor at University Pompeu Fabra in Barcelona.
His broad scientific interest is in the understanding of the encoding of functional information in biological sequences. He has participated in many international genomics and functional genomics projects, including the human, the mouse, ENCODE, GTEx, ICGC, FANTOM, Blueprint, and others. He initiated GENCODE, the international effort to establish the reference gene and transcript annotation for the human and mouse genome, and he is still part of the consortium. He is currently participating in several projects funded by the National Institutes of Health (NIH), the European Commission and the Spanish and Catalan governments.
He is part of large-scale genomics leadership for the GA4GH, and co-chair of the ethics committee for the Human Cell Atlas. He promoted and helped to launch the Catalan Initiative for the Earth BioGenome Project, and he is a member of its steering committee.
Geert Molenberghs
The applied statistical (data) scientist in a high-profile and societal environment: Past, present, and future
The applied statistical (data) scientist in a high-profile and societal environment: Past, present, and future
A perspective will be offered on the profession of the biometrician, the biostatistician, and more generally the applied statistical scientist, in a continually and rapidly changing environment. The specifics of working in a multi-disciplinary environment will be discussed, referring to collaboration with agronomists, biologists, epidemiologists, medical professionals, etc. At the same time, interactions with other semi- or fully quantitative fields will be touched upon, such as computational biologists, computer scientists, engineers, etc. The current-day (r)evolution towards data science will be placed against a historical timeline of our field, which saw, over a relatively brief period of just one century, the coming of epidemiology and observational studies, (statistical) genetics, bioinformatics, the omics, big data, data science, data analytics, atrificial intelligence, etc. Historical notes related to the international evolution of our field, with particular emphasis on the hispanic world, will be offered.
Biosketch
Geert Molenberghs is Professor of Biostatistics at UHasselt and KU Leuven. He received a degree in mathematics (1988) and a Ph.D. in biostatistics (1993) from UAntwerpen.
He published on surrogate markers in clinical trials, and categorical, longitudinal, and missing data. He was Editor for Applied Statistics, Biometrics, and Biostatistics, and is currently Executive Editor of Biometrics. He was President of the International Biometric Society. He is Fellow of the American Statistical Association, received the Guy Medal in Bronze from the Royal Statistical Society, and held visiting positions at Harvard. He is founding director of the Center for Statistics at UHasselt and of the Interuniversity Institute for Biostatistics and statistical Bioinformatics (UHasselt and KU Leuven).
He received research funding from FWO, IWT, the EU (FP7), U.S. NIH, U.S. NSF, UHasselt, KU Leuven, ECDC, and EMA. He is member of the Belgian Royal Academy of Medicine. He has been active (as advisor, researcher, and communicator) in the SARS-CoV-2 pandemic response. He has taken part in various grant funded research programs on rare diseases, including IDEAL, EJP RD, ERDERA (future), and RealiseD (future).
Invited talks
Carmen Armero Cervera
Joint Bayesian models for heart failure survival and longitudinal data and how we learned about these models together with GRBIO colleagues
Joint Bayesian models for heart failure survival and longitudinal data and how we learned about these models together with GRBIO colleagues
Joint modeling of longitudinal and survival (JM-LS) data allow the inclusion of longitudinal information in survival models, as well as the addition of missing data processes in longitudinal studies. These models are very attractive from a methodological point of view and very valuable in biomedical studies. They were also a point of union between a group of researchers from VABAR (València Bayesian Research Group) and GRBIO, who started a joint task of studying these models: we learned a lot and had more fun. We present a Bayesian JM-LS which accounts for longitudinal continuous information in the unit interval and ordinal longitudinal covariates to learn about competing risk models and discuss their application to a Heart Failure (HF) study where patients underwent cardiac resynchronization therapy.
Biosketch
Carmen Armero is Full Professor of Statistics and Operations Research in the Universitat de València and Biomathematics and Statistic Scotland (BioSS) Associate. She is also the Director of the Bayesian València Research Group, the past Chair of the València International Bayesian Analysis Summer School, and the vice-president of Statistics of the Spanish Society of Statistics and Operations Research (SEIO). Her research has always been carried out in the framework of Bayesian Inference, first in queueing systems and later in survival analysis and longitudinal models. She is currently the PI of the Universitat de València in a European project on Bayesian networks in semi-autonomous driving of vehicles. She has published methodological and applied papers in high impact scientific journals, such as Statistics in Medicine, Statistical Methods in Medical Research, The American Statistician, Stochastic Environmental Research and Risk Assessment, Queueing Systems, Journal of the Royal Statistical Society Series D, or Journal of Computational and Graphical Statistics (See CV_publications for a list of her publications). She has made research visits to the University of Cambridge (UK), Lancaster University (UK), Universitat Autónoma de Barcelona, Biomathematics and Statistics Scotland (BioSS) and Stockholm Resilience Centre (Sweden)
Martí Casals Toquero
The Rise of Sport Analytics: New Opportunities in Research
The Rise of Sport Analytics: New Opportunities in Research
Sports Analytics have rapidly grown, offering new avenues for research and applications in performance improvement, injury prevention, and game strategy. This talk explores the evolution and current impact of Sports Analytics while emphasizing the untapped potential of interdisciplinary networks among statisticians and researchers. Such collaborations offer unique opportunities to tackle complex challenges, innovate methodologies, and uncover new insights. Through practical examples, we will illustrate how data-driven approaches are transforming sports science and highlight the future possibilities this growing field holds for research and application.
Biosketch
Martí Casals holds a PhD in Statistics from the University of Barcelona (UB). Currently, he is an associate professor of statistics at the Faculty of Medicine of the UVic-UCC and of Sport Analytics at the National Institute of Physical Education of Catalonia (INEFC-UB). Martí’s research lies in the fields of sports biostatistics, sports analytics, and statistical thinking. He has collaborated as a sports statistician at FC Barcelona and as an external biostatistician and Basketball Analyst at the Memphis Grizzlies.
Pol Castellano Escuder
Interpretable multi-omics integration with UMAP embeddings and density-based clustering
Interpretable multi-omics integration with UMAP embeddings and density-based clustering
Integrating high-dimensional multi-omics data is essential for understanding the different layers of biological control. Single-omics methods offer useful insights but often miss the complex relationships between genes, proteins, and metabolites. In this talk, I will present GAUDI (Group Aggregation via UMAP Data Integration), a non-linear, unsupervised method that uses independent UMAP embeddings to analyze multiple data types together. GAUDI reveals relationships across omics layers better than several current methods. It not only clusters samples by their multi-omics profiles but also identifies key features contributing to each cluster, providing clear and interpretable visualizations. I will discuss how GAUDI enables researchers to identify meaningful patterns and potential biomarkers across diverse omics types.
Biosketch
Pol is a Bioinformatician at Duke University, working at the intersection of computational biology and artificial intelligence to accelerate scientific discovery. His expertise includes developing autonomous AI agents and machine learning pipelines that expand the scope of traditional bioinformatics. By integrating multi-omics data with AI techniques like transformer models, neural networks, and other statistical frameworks, Pol’s work seeks to uncover complex biological patterns for a deeper understanding of molecular biology. Pol holds a Ph.D. in Biomedicine with a focus on Bioinformatics from the University of Barcelona and has participated in research projects at institutions in Spain, the United Kingdom, and the United States. His contributions have led to numerous publications in high-impact journals and fostered international collaborations, including joint projects with Duke-NUS Medical School in Singapore.
David Conesa Guillén
A computationally efficient procedure for combining ecological datasets by means of sequential consensus inference
A computationally efficient procedure for combining ecological datasets by means of sequential consensus inference
In ecology and environmental sciences, combining diverse datasets has become an essential tool for managing the increasing complexity and volume of ecological data. However, as data complexity and volume grow, the computational demands of previously proposed models for data integration escalate, creating significant challenges for practical implementation. This study introduces a sequential consensus Bayesian inference procedure designed to offer the flexibility of integrated models while significantly reducing computational costs. The method is based on sequentially updating some model parameters and hyperparameters, and combining information about random effects after the sequential procedure is complete. The implementation of the approach is provided through two different algorithms. The strengths, limitations, and practical use of the method are explained and discussed throughout the methodology and examples. Finally, we demonstrate the method's performance using three different examples—one simulated and two with real ecological data—highlighting its strengths and limitations in practical ecological and environmental applications.
Biosketch
David Conesa holds a PhD in Mathematics from the University of Valencia, where he has been working since October 1993. He carries out research in the area of statistical modelling of situations in which uncertainty is present, mainly from a Bayesian perspective. Thus, for example, he has worked on problems of waiting time models, hierarchical models, efficiency analysis, animal survival models, and most recently on models of spatial distribution of species and diseases. He is co-author of more than 90 publications, most of them indexed in journals of international impact, and more than 250 communications in national and international congresses. He has co-directed 10 theses and is currently co-directing three more. He has done research and teaching stays at Duke University (USA), Lancaster University (UK), Statistical and Applied Mathematics Sciences Institute (USA), Université de Bordeaux (France), Università de Bergamo (Italy), Universidade do Minho (Portugal), Università Cattolica del Sacro Cuore (Italy) and several Spanish universities. He was the President of the Spanish Society of Biostatistics during 2014 and 2015, and is currently the Editor-in-Chief of the journal Statistics and Operations Research Transactions (SORT). He has taught Mathematics, Biostatistics, Experiment Design, Spatial and Temporal Statistics, Mathematical Statistics, Statistical Modelling, Probability and Simulation, Bayesian Statistics, Computation and Programming in R, Generalized Linear Models, and Smoothing, Additive and Mixed Models.
Xavier de la Cruz Montserrat
Breaking the Bottleneck in Genetic Variant Interpretation for Precision Medicine
Breaking the Bottleneck in Genetic Variant Interpretation for Precision Medicine
Personalized medicine, a promising branch of modern healthcare, has been made possible by the rapid development of next-generation sequencing (NGS), which has revolutionized genetic diagnostics and provided unprecedented opportunities for tailored treatments. However, the clinical utility of NGS remains constrained by the challenge of interpreting the impact of the genetic variants it uncovers. A significant portion of these variants remains classified as Variants of Uncertain Significance (VUS), undermining their clinical utility and creating anxiety for patients and their families. This situation has driven the development of computational pathogenicity predictors, machine learning tools trained to produce binary classifications—benign or pathogenic—of variants. While these methods have been integrated into clinical workflows, their accuracy and interpretability still fall short of meeting the stringent requirements of medical applications. In this context, recent years have witnessed a paradigm shift toward continuous prediction models, which aim to provide more precise quantitative assessments of variant impacts on protein function. These approaches leverage a combination of technologies that include data from deep mutational scanning experiments and machine learning techniques. By moving beyond binary labels, continuous predictors hold the promise of elucidating critical aspects of variant effects, such as disease severity and therapeutic response, thereby enhancing their relevance for clinical decision-making in precision medicine. This talk will explore the current state of methodologies to estimate the impact of protein variants, focusing on an original approach developed in our group to address the problem of using a small amount of protein-specific datasets to generate predictions for any protein, combining regression models and an ensemble-based approach. I will discuss, among other things, the results obtained both in rigorous validation experiments as well as in our participation in the CAGI5 and CAGI6 challenges, comparing our performance with that of other methods in the field.
Biosketch
His career revolves around the application of in silico tools to address biological questions. His Ph.D. focused on studying the protein structure principles underlying function, a topic he pursued during his stay at the NIH (1993-1997) and UCL (1997-2000). After joining ICREA, this topic became the main focus of his work (PCB, 2001-2009; IBMB-CSIC, 2009-2012). However, his interests have gradually shifted towards the study of translational problems in biomedicine. In this direction, in 2012, he joined the Vall d'Hebron Institute of Research (VHIR) to enhance the applicability of their work on the pathogenicity of genetic variants, bringing it closer to healthcare stakeholders. His efforts recently gained international recognition after his participation in the prestigious CAGI5/ENIGMA and CAGI6/ARSA contests, where his group ranked second in both competitions.
Ramón Díaz Uriarte
An overview of cancer progression and evolutionary accumulation models
An overview of cancer progression and evolutionary accumulation models
Cancer progression and evolutionary accumulation models have been developed to discover dependencies in the irreversible acquisition of binary traits (e.g., mutations) from cross-sectional data. They have been used in computational oncology and virology but also in widely different problems such as malaria progression. Some of these methods have been applied to data with phylogenetic and longitudinal dependencies in questions including tool acquisition in animals and antimicrobial resistance in tuberculosis. Because of their interest, new methods continue to be developed. These tools have been used to make predictions about future and unobserved states of the system, identify different routes of, and dependencies in, feature acquisition in subsets of the data, and improve patient stratification and survival prediction based on the evolutionary trajectories and denoising of the data. The rich variety of available models increases their utility as markedly different dependency structures can be compared on the same data. These methods also hold promise to help identify therapeutic targets and improve evolutionary-based treatment approaches. I will first give an overview of the available methods. Then, using fitness landscapes, and discussing the conflation of lines of descent, path of the maximum, and mutational profiles, I will focus on how and why inferences might not be about the processes we intend, in particular under bulk sequencing. I will comment on major research opportunities, including translational uses, identifying dependencies that derive from frequency-dependent selection, and the relationship of these methods with phylogenetic comparative methods.
Biosketch
Ramon is currently Professor ("Catedrático") at the Department of Biochemistry, Universidad Autónoma de Madrid (UAM). Before that, and for nine years, he was a researcher at the Spanish National Cancer Center (CNIO). His research (at UAM and CNIO) has been in bioinformatics, computational biology, and statistical computing (applied to bioinformatics problems).
His background is a mix of biology (BSc. in Biology ---from UAM---, PhD Zoology ---University of Wisconsin-Madison) and statistics (MSc. Biometry, MSc. Statistics, both from UW-Madison). His PhD was mixture of theoretical and field work in behavioral ecology (chasing lizards in Arizona and Brazil) and some statistics. During that time, he also worked on the comparative method in evolutionary biology. After finishing his PhD and before arriving at CNIO he worked as statistician in a company that developed artifical intelligence software and in a marketing research company.
During the last 20+ years he has worked mainly in the use of statistics and statistical computing in bioinformatics problems. For instance, classification problems, the usage of parallel computing for web-based stats applications for bioinformatics problems, and on the identification of DNA copy number alterations from aCGH data.
During the last 13 years or so, he has focused on trying to understand the sequence of driver genetic events and predict tumor evolution using cross-sectional data with so called "cancer progression models". These areas are currently his main focus: evolutionary accumulation models and evolutionary models of cancer, and he is trying to add to the mix phylogenetically-based comparative methods and approaches based on causal inference.
María Durban Reguera
Coherent cause-specific mortality forecasting via constrained penalized regression models
Coherent cause-specific mortality forecasting via constrained penalized regression models
Overall mortality trends are the summation of cause-specific mortality experiences. Consequently modelling and forecasting changes in cause of death patterns allows us to recognize the drivers of all-cause mortality and identify emerging health challenges. When dealing with cause-specific mortality, we need to ensure that cause specific deaths must sum to the total number of deaths. We propose a simple and fast method to obtain coherent cause-specific mortality trajectories based on Lagrange multipliers and penalized splines. We apply the method proposed to fit and forecast mortality of males in the USA for the five leading causes of death.
Biosketch
María Durbán holds a PhD in Mathematics from Heriot-Watt University, United Kingdom and she is a Professor in the Department of Statistics at the Universidad Carlos III of Madrid since 2017. She has more than 20 years of experience in complex data modelling and its application in areas such as: Medicine, Environment, Insurance, etc. Transfer of knowledge has been very present throughout her career, both at the teaching level, giving courses in the field of modelling for different public institutions, and participating in numerous University-Business-Institutions projects focused on topics such as data analysis for decision-making or the impact evaluation of policies in both the public and private sectors.
Itziar Irigoien Garbizu
Functional data analysis and fuzzy classification. Independent concepts or a successful combination?
Functional data analysis and fuzzy classification. Independent concepts or a successful combination?
Nowadays we are increasingly able to collect more complex data, and many challenges in data analysis stem from that complexity. The progression from a single numerical value as the unit of study, to a multivariate vector, then to a functional curve, or even to a skeletal shape representation, illustrates this evolution. In other words, there has been a shift from using large sample sizes in low-dimensional spaces to using relatively small ones in high-dimensional spaces. The perspective offered by functional data analysis (FDA) often provides a framework that allows the analysis of curves, images, or functions in high dimensions overcoming the problem of high dimensionality. For this reason, FDA has started to appear in the computational and bioinformatics literature over the last years. On the other hand, fuzzy classification assigns a degree of membership to each unit, often used in disease diagnosis to classify patients based on medical data and with artificial intelligence techniques to address uncertainty in diagnosis. Using the COVID-19 Raman spectroscopy data set we show the usefulness of combining functional data analysis and the distance-based fuzzy classifier FC-DF highlighting their strengths and limitations.
Biosketch
I am graduated in Mathematics (1996) and PhD in Computer Science (2008) from the University of the Basque Country (UPV/EHU). Since 2011 associate lecturer in the Department of Computer Science and Artificial Intelligence in the Faculty of Computer Science at the UPV/EHU. I am a member of the RSAIT research group, which works in the field of social robotics and integrates statistical and machine learning techniques to provide robots with greater autonomy. In addition, my research interests focus on the definition and development of statistical techniques to address biomedical and bioinformatics problems, particularly in distance-based data analysis techniques.
Rosa Lamarca Casado
Rare diseases challenge: no or insufficient patients in a control arm</strong
Rare diseases challenge: no or insufficient patients in a control arm
In rare diseases, single-arm, non-randomised, open-label trials are frequently conducted, mainly due to ethical reasons or the study being unfeasible as patients reject to participate. However, there are some inherent limitations in this type of designs, for example, time-to-event endpoints and patient reported outcomes are not interpretable without a control arm in the study. There are other circumstances, where a randomised control trial is doable but the number of subjects in the control arm are insufficient. The use of external data (clinical trial data or real-world data) appears as a way to overcome these limitations and improve the efficiency of clinical trials. A critical step in bringing external data is to ensure that the external data is comparable to the study population in terms of study entry criteria, in particular to measured baseline prognostic/ confounding variables. Ideally, both external data and study population should be exchangeable with each other. There are several frequentist methodologies to adjust for differences in baseline prognostic/ confounding factors, such as, the propensity scores (Rosenbaum and Rubin, 1983) based on matching, stratification, inverse probability of treatment weights, or covariate adjustment on propensity score methods. These methods balance the prognostic factors, then the comparison of outcomes between the treatment groups yields an unbiased treatment effect estimate, as long as all the confounding variables are included in the propensity score model. Also, Bayesian methods have been developed to borrow information from external data by creating an informative prior distribution. The prior can be derived based on different approaches such as the meta analytic predictive method. It is important to note that the type I error may be inflated by incorporating external data as a nonrandomised comparison may introduce bias due to unmeasured confounding covariates. Therefore, simulations should be carried out to evaluate the operating characteristics when including external data. Regulatory agencies have not ignored this situation and have taken some initiatives and released corresponding guidance with recommendations when designing externally controlled clinical trials. However, the use of external controls is not mature enough yet and interactions with regulatory agencies are advisable at the time of the study design.
Biosketch
Rosa Lamarca is a highly experienced biostatistician leading the statistics team of the Bone and Rare Oncology therapeutic area at Alexion. She brings a holistic perspective to clinical development beyond pharmaceutical statistics. She enjoys developing diverse teams to face drug development challenges. Rosa has supported early and late-stage clinical development in various diseases with successful approvals by regulatory agencies and positive pricing and reimbursement evaluations by Health Technology Agencies. She was President of the Catalan Statistical Society, and the Spanish representative in the European Federation of Statisticians in the Pharmaceutical Industry. She is the co-author of more than 40 publications in peer reviewed biostatistics and medical journals.
Josu Najera-Zuloaga
Modelling Patient-Reported Outcomes: A case-study of COPD patients
Modelling Patient-Reported Outcomes: A case-study of COPD patients
The World Health Organization defines health as a complete physical, men- tal, and social well-being and not merely the absence of disease or infirmity. In this sense, patient-reported outcomes (PRO) are becoming primary outcome measurements in observational and experimental studies, as they capture evidence of patients’ status that is difficult to evaluate physically, such as pain, quality of life or, satisfaction with care. PRO are usually obtained using item-based questionnaires, assigning scores to each item response and summing the scores across a group of items to create overall scores, usually called dimensions, which decompose the health aspect they are evaluating. The binomial distribution is the most common candidate when modeling discrete and bounded outcomes, such as PRO dimensions. However, the fact that questionnaire items are answered by the same individuals sets up a correlation structure in the ordinal responses that constitute the final score, which increases the variability beyond the mean-variance structure of the binomial distribution, a property called overdispersion. In fact, PRO scores tend to have skewed distributions, often showing U, J or J-inverse shapes. In this talk, we are going to present the main contributions of our research group in the field of PRO modeling, from the proposal of an optimal probability distribution to a joint model for the analysis of longitudinal PRO and survival data. Additionally, we will present the most clinically significant results obtained from applying the developed models to a health-related quality of life study in patients with Chronic Obstructive Pulmonary Disease (COPD).
Biosketch
Josu Najera-Zuloaga is an Assistant Professor in the Department of Mathematics at the University of the Basque Country (UPV/EHU). In January 2015, he began his Ph.D. studies at the Basque Center for Applied Mathematics - BCAM, supported by a Severo Ochoa predoctoral fellowship, and successfully defended his Ph.D. thesis in Mathematics and Statistics at UPV/EHU in December 2017. Throughout his academic journey, he has conducted research visits at prestigious institutions including the University of Manchester, Karolinska Institutet, and the Polytechnic University of Catalonia. His primary research interest lies in developing statistical methodologies to address complex issues primarily encountered in clinical practice, with a particular focus on patient-centered healthcare through the use of patient-reported outcomes (PRO). The development of regression models for analyzing PRO is his main research focus. He also collaborates with clinicians from Galdakao-Usansolo Hospital on experimental research and with researchers from the Polytechnic University of Catalonia on multistate model development. Additionally, he participates in various funded research projects, groups, and networks, including the National Biostatistics Network (BIOSTATNET), the MATHMODE research group, and the RICAPPS Network (Red de Investigación en Cronicidad, Atención Primaria y Prevención y Promoción de la Salud). To date, he has authored 8 articles in JCR journals (5 in Q1), with an h-index of 4 (WoS) and over 80 citations (WoS), as well as more than 10 conference presentations.
Pere Puig Casado
Estimating the population size in capture-recapture experiments with right censored data
Estimating the population size in capture-recapture experiments with right censored data
Capture-recapture methods are commonly used in ecology to estimate animal population sizes and species richness. These methods have become popular, not only in ecology but also in social and medical sciences, to estimate the size of elusive populations such as illegal immigrants, illicit drug users, or people having a drinking problem. The talk will address a new non-parametric approach for estimating the population size when we only know how many animals or individuals were observed once, twice, ... , as well as how many animals or individuals were observed r or more times (right censoring pattern). Similar to the Chao estimator, the method provides a lower bound on population size as well as bootstrap confidence intervals. The particular case of censoring at r=2 will be studied in detail, along with several applications in ecological and social sciences.
Biosketch
Pere Puig is professor of Statistics and Operations Research at the Department of Mathematics of the Universitat Autònoma de Barcelona (UAB) and an affiliate researcher at the Centre de Recerca Matemàtica (CRM). He leads the research group in advanced statistical modelling at this university and has extensive experience developing mathematical and statistical methods in collaboration with diverse groups, primarily in biology and health sciences. He has numerous papers in renowned scientific journals and is a frequent collaborator with the UK Health Security Agency.
María Xosé Rodríguez Álvarez
Evaluating the Accuracy of Prognostic Biomarkers in the Presence of External Information
Evaluating the Accuracy of Prognostic Biomarkers in the Presence of External Information
The receiver operating characteristic (ROC) curve is widely used to assess the accuracy of continuous biomarkers for binary outcomes (e.g., healthy and diseased). However, evaluating the impact of additional patient or environmental information on diagnostic accuracy is also important. Furthermore, studies often focus on prognosis rather than diagnosis, especially in survival analysis, where outcomes evolve over time (e.g., alive and death). To assess the accuracy of continuous prognostic biomarkers for time-varying outcomes, time-dependent extensions of the ROC curve have been proposed. This work introduces a novel penalised-based estimator of the cumulative-dynamic time-dependent ROC curve, which accounts for the potential modifying effects of covariates on biomarker accuracy. Building on previous approaches, we adopt a modelling framework that considers flexible models for the conditional hazard function and the biomarker, allowing for the accommodation of non-proportional hazards and nonlinear effects through penalised splines, thus addressing the limitations of earlier methods. We apply our method to evaluate the ability of the Global Registry of Acute Coronary Events (GRACE) risk score to predict mortality after discharge in patients who have experienced acute coronary syndrome, and how this ability may vary with left ventricular ejection fraction.
Biosketch
María Xosé Rodríguez Álvarez earned her PhD in Mathematics from the Universidade de Santiago de Compostela in 2011 and has a diverse professional background spanning the private sector and academia. Since 2021, she has been a Ramón y Cajal fellow at the Universidade de Vigo. Her research focuses on (1) developing efficient estimation methods for flexible regression models, (2) statistically evaluating the diagnostic and prognostic value of clinical biomarkers, and (3) proposing new statistical methods for analysing spatial and spatio-temporal processes in the context of agricultural field experiments. Her work emphasises practical applications and interdisciplinary collaboration, with a strong commitment to disseminating advancements through free software.
Sonia Tarazona Campos
Decoding multi-omic regulatory networks: a regression-based approach
Decoding multi-omic regulatory networks: a regression-based approach.
Multi-omic experiments offer an unprecedented opportunity to explore gene expression regulation, providing deep insights into the intricate regulatory mechanisms of biological systems. However, the high dimensionality, heterogeneity, and multicollinearity of multi-omic datasets present significant challenges for statistical modeling and variable selection when inferring regulatory networks. Additionally, most existing tools for multi-omic regulatory network inference either fail to accommodate diverse omic modalities or lack the ability to generate and compare phenotype-specific networks. To address these limitations, we developed MORE (Multi-Omics Regulation), a novel methodology that leverages regression-based frameworks and advanced variable selection strategies to construct phenotype-specific regulatory networks across any number or type of omic data. MORE integrates prior regulatory knowledge and offers functionalities for systematic comparison of the resulting networks. We benchmarked MORE against other state-of-the-art tools using simulated datasets and applied it to an ovarian cancer case study. Our results demonstrate the robustness and versatility of MORE in unraveling regulatory mechanisms in complex biological systems, underscoring its potential as a valuable resource for multi-omic data analysis.
Biosketch
Sonia Tarazona is an Associate Professor in the Department of Applied Statistics, Operations Research, and Quality at the Universitat Politècnica de València (UPV), where she leads the BiostatOmics group, part of the Multivariate Statistical Engineering Group. Dr. Tarazona is a statistician who began her research career in Bioinformatics in 2008 and earned her PhD in Statistics and Optimization in 2014. Her research primarily focuses on developing statistical methods and software for omics data analysis and multi-omics data integration, resulting in several publicly available tools and R packages, including NOISeq, PaintOmics, MultiPower, MOSim, MORE, or COXMOS. Currently, Dr. Tarazona’s research interests is adapting these methodologies to emerging single-cell and spatial transcriptomics technologies.
Jacobo de Uña Álvarez
On goodness-of-fit testing with survival data
On goodness-of-fit testing with survival data
In this talk I will present a new general strategy for goodness-of-fit testing with survival data. The setting is that of testing for a parametric family of distribution functions when the data are deteriorated due to random censoring and/or random truncation. A key step is the characterization of the null hypothesis through a moment equation which involves the estimation of the observable distribution under both the null and the alternative. A new omnibus test will be proposed, and its theoretical properties will be presented. Particular applications include, but are not limited to, right-censored data, left-, right- or doubly-truncated data, or interval censored data. Advantages with respect to existing methods will be discussed. The finite sample performance of the test will be investigated through simulations. Illustrative real data analyses will be given. This is joint work with Juan Carlos Escanciano.
Biosketch
Jacobo de Uña Álvarez is Professor in Statistics at the Universidade de Vigo, Galicia, Spain. His educational background includes a BSc in Mathematics (1995) and a PhD in Mathematical Statistics (1998), both at the University of Santiago de Compostela. Jacobo's main research area is nonparametric statistics and goodness-of-fit tests, and their application to Survival Analysis and high-dimensional data, including censored and truncated data, multi-state models and multiple testing. Along the years he has collaborated with leading researchers worldwide, publishing more than 100 papers in renowned journals like Biometrics, Bernoulli or Biometrika, among many others. He has recently co-authored the monograph The Statistical Analysis of Doubly Truncated Data: With Applications in R (Wiley, 2022). Jacobo coordinates SiDOR research group, founded in 1998, at the Universidade de Vigo.
Natalia Vilor Tejedor
Precision Genetic Neurodepidemiology: from risk factors to statistical prediction, prevention and clinical translation
Precision Genetic Neurodepidemiology: from risk factors to statistical prediction, prevention and clinical translation
This talk will delve into biostatistical strategies advancing the field of genetic neuropidemiology. Recent advancements have enabled more precise identification of genetic and environmental factors, significantly enhancing brain health, risk stratification, disease prediction, and prevention strategies. Key highlights include applying multivariate models to extensive genomic, environmental, and brain imaging datasets, and the assessment and implementation of statistical tools designed for data integration. These methods further emphasize incorporating diversity and sex-specific mechanisms into study populations, bolstering the applicability and accuracy of our findings. The implications of these advances extend beyond improved diagnostic accuracy, paving the way for potential biological pathways that support personalized medicine, prevention, and targeted therapeutic interventions.
Biosketch
Dr. Natalia Vilor Tejedor holds a PhD in Biomedicine with a specialized MSc in omics data analysis and a foundational background in mathematics and statistics. With a focus on understanding the aetiology and prevention of neurological and neurodevelopmental disorders, Dr. Vilor-Tejedor explores the roles of both genetic and environmental risk factors. Currently leading the Genetic Neuroepidemiology and Biostatistics team at the BarcelonaBeta Brain Research Centre, she hold dual appointments with the Centre for Genomic Regulation and Radboud University Medical Center. Dr. Vilor-Tejedor integrates multiomics, environmental and neuroimaging data to uncover insights into neurodegenerative diseases, and is deeply committed to inclusive science, fostering mentorship, interdisciplinary collaboration, and public engagement.
PhD Students
Nora Amama Ben Hassun
Development and Evaluation of Metrics for Assessing Synthetic Tabular Data Quality</strong
Development and Evaluation of Metrics for Assessing Synthetic Tabular Data Quality
The growing reluctance to share original datasets and the increasing demand to comply with privacy regulations have motivated the adoption of synthetic data. Synthetic data replicates the statistical properties of the original datasets while ensuring that individual-level information or sensitive variables are not disclosed. However, to effectively evaluate the quality of synthetic data, the development and refinement of validation metrics based is required. This assessment ensures the usability and reliability of synthetic datasets. This research aims to introduce some existing validation metrics implemented in tools such as the synthpop package. The focus is on synthetic tabular data, with an emphasis on showcasing a comprehensive list of validation metrics that hold statistical significance and serve as a foundation for the development of new metrics. To address the challenges of validating synthetic data, the research highlights tailored methodologies for specific domains, such as energy, where there are unique challenges. Synthetic data offers opportunities to accelerate model training while ensuring compliance with privacy regulations. By developing robust metrics, the goal is to provide a practical framework for validating high-quality synthetic datasets that meet the needs of sensitive fields. All these metrics will be illustrated through a case study to highlight their applicability and relevance, ultimately filling a considerable gap in the literature concerning synthetic data validation in the energy sector. Validation metrics are examined on three key dimensions: resemblance, utility, and privacy. Resemblance metrics evaluate the similarity in the statistical distributions between the synthetic and original datasets. Utility assesses the suitability of synthetic data for specific analytical tasks, such as machine learning or statistical modeling. Privacy, meanwhile, ensure that sensitive information from the original data cannot be reconstructed or identified.
Biosketch
Nora Amama Ben Hassun is a PhD candidate in Statistics and Operations Research at the Universitat Politècnica de Catalunya - BarcelonaTECH, supervised by Dr. Daniel Fernández Martínez and Dr. Jordi Cortés Martínez. She holds a Bachelor's degree in Statistics from the Universitat de Barcelona and Universitat Politècnica de Catalunya - BarcelonaTECH, and a Master's in Statistics and Operations Research from the Universitat Politècnica de Catalunya - BarcelonaTECH and the Universitat de Barcelona. Her doctoral research focuses on developing a new methodology for the validation of synthetic data, with applications primarily aimed at the energy sector.
Leire Garmendia Bergés
Study of the global AUC(t) for a multi-state model
Study of the global AUC(t) for a multi-state model
The motivation for my PhD arises from clinical data from the DIVINE project where patients hospitalized due to COVID-19 are followed through several states. One of the aims of this project was to analyze the evolution of those patients and for that a complex multi-state model (MSM) was designed. This MSM allows us to analyze the risk factors for the different events of interest (e.g. non-invasive mechanical ventilation (NIMV), invasive mechanical ventilation (IMV), or death) as well as to predict the course of the disease for new patients, but we realized that we didn't know how to analyze its predictive capacity.
Therefore, the main objective of my PhD is to evaluate the discriminative ability for MSM, and for that, the area under the time-dependent ROC curve (\( AUC(t) \)) can be used. In this work, we focus initially on those patients with severe pneumonia who can transition to two competing events: the need for NIMV or IMV; and we propose an estimator for the global \( AUC(t) \) for a competing risk model.
Under competing risk models, different estimators can be used to estimate the (partial) \( AUC(t) \) of each transition (\( AUC_k(t), k=1,2 \)). In this work, we propose an estimator \( \widehat{AUC}_{CR}(t) \) for the global \( AUC(t) \) (\( AUC_{CR}(t) \)) for a competing risk model as a weighted sum of \( \widehat{AUC}_k(t), k=1,2 \) with each \( AUC_k(t) \) being weighted by the probability of experiencing that event \( k \) before time \( t \). We have proved that \( \widehat{AUC}_{CR}(t) \) is consistent and asymptotically normal.
Biosketch
I graduated from the Universidad del País Vasco (UPV/EHU) with a Bachelor’s degree in Mathematics (2020) and from the Universitat Politècnica de Catalunya (Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)) with a Master’s degree in Statistics and Operations Research (2022). Since July 2023, I’ve been working at the Basque Center for Applied Mathematics (BCMA). In November 2023, I started my PhD studies in Mathematics and Statistics at the UPV/EHU under the supervision of Irantzu Barrio and Guadalupe Gómez Melis. My PhD project focuses on evaluating the predictive capacity of multistate models.
Laia Egea Cortés
Partial Ordered Stereotype Model, a New Model for Ordinal Data
Partial Ordered Stereotype Model, a New Model for Ordinal Data
Ordinal response variables are prevalent in many fields and require specific methods that properly respect the natural ordering of their categories. However, many researchers and practitioners still apply techniques designed for nominal or continuous variables to analyse ordinal data, often treating the response categories as equally spaced when they may not be. This approach can lead to misleading results. My talk presents the Partial Ordered Stereotype Model (POSM), an extension of the Ordered Stereotype Model (OSM) for ordinal response variables. The OSM does not assume equal-spaced response categories by incorporating score parameters, which specify the potentially unequal distances between adjacent response categories. These parameters reflect the discriminant capability of the covariates, indicating how effectively they can distinguish between response categories. However, different covariates may exhibit distinct discriminant capa- bilities. The POSM addresses this by allowing different sets of score parameters within the same model, thus capturing the characteristics of each covariate in a single framework. An application of the model using a real-world dataset in aquaculture is included to show the utility and interpretation of the method. Our objective is to identify variables impacting salmon health and assess how these variables differentiate between health levels.
Biosketch
Laia Egea Cortés is a PhD student at Victoria University of Wellington (New Zealand), supervised by Professors Daniel Fernández, Ivy Liu, and Richard Arnold.
Her research focuses on developing methodology for ordinal categorical data. She holds a degree in Mathematics and Statistics from the Universitat Autònoma de Barcelona (2017) and a Master’s in Statistics and Operations Research from the Universitat Politècnica de Catalunya and Universitat de Barcelona (2019).
Laia worked as a statistician at the Catalan Institute of Oncology, within the Centre for Epidemiological Studies on Sexually Transmitted Infections and AIDS in Catalonia (CEEISCAT), from 2019 to 2022. She also worked at Sant Joan de Déu Foundation in the Epidemiology and Ageing Group from 2016 to 2019. She has also been an associate professor at the Universitat Autònoma de Barcelona from 2020 to 2022, and a tutor at Victoria University of Wellington since 2023.
Pablo Flores Muñoz
An equivalence test to detect functional similarity between feature lists based on the joint enrichment of gene ontology terms
An equivalence test to detect functional similarity between feature lists based on the joint enrichment of gene ontology terms
In the current era, omics technologies such as high-throughput experiments have significantly transformed the fields of biology and medicine. These advances enable the generation of large volumes of biological data, such as gene lists, proteins, and other biological features, under different experimental conditions. Although getting this large amount of information represents a breakthrough, it is crucial to develop appropriate statistical methods to analyze and extract knowledge from these data. In this context, the present study proposes a statistical method based on an equivalence hypothesis test to evaluate biological similarity between feature lists. The central idea is that two or more feature lists can be considered biologically similar if they share a significant proportion of enriched GO terms. First, the choice of the Sorensen index is justified as an appropriate metric for assessing the dissimilarity of joint enrichment between the lists under comparison. Next, the sampling distribution of this measure is studied both theoretically and through approximation using the Bootstrap method, which proves to be particularly effective when the enrichment level is low. Based on these distributions, an equivalence hypothesis test is developed, along with its corresponding irrelevance threshold, which is less arbitrary than the thresholds commonly used in equivalence approaches. Furthermore, the R package goSorensen has been developed, published, and is available on the Bioconductor platform. This informatics tool allows for the efficient application of the proposed methodology. Additionally, a dissimilarity matrix is constructed based on the irrelevance threshold, which defines when two lists are significantly equivalent. This matrix provides an inferential measure of how close or distant the compared lists are from each other. The graphical representation and interpretation of this matrix, such as in an MDS-Biplot, is useful for identifying the GO terms associated with the formation of equivalence between lists. Finally, it is important to note that the proposed methodology has been rigorously evaluated and applied to real gene lists, with an exhaustive comparison of the results obtained against other similar comparison methods.
Biosketch
Pablo Flores is an engineer in Computer Statistics from the Escuela Superior Politécnica de Chimborazo (ESPOCH) and holds a master's degree in Statistics and Operations Research from the Universitat Politècnica de Catalunya-BarcelonaTECH (UPC), where he is currently a PhD candidate in Bioinformatics. He is a statistics teacher at ESPOCH and a researcher in the Data Science Research Group (CIDED) at ESPOCH and in the Biostatistics and Bioinformatics Research Group (GRBIO) at UPC. He also served as the director of the statistics programme at ESPOCH. Pablo has published in indexed journals and participated in national and international conferences. His work focuses on statistical methods for determining functional biological similarities between feature lists, leading to the development of the goSorensen package, available on Bioconductor since August 2022, with over three thousand downloads. Additionally, he has served as a reviewer for regional scientific journals and collaborated on relevant research projects.
Pavla Krotka
Bias-corrected treatment effect estimators for group-sequential platform trials with non-concurrent controls
Bias-corrected treatment effect estimators for group-sequential platform trials with non-concurrent controls
Platform trials enhance drug development by offering increased flexibility and efficiency. They evaluate the efficacy of multiple treatment arms, with the added benefit of permitting treatment arms to enter the trial over time and to stop early based on interim data. Efficacy is usually assessed using a shared control arm. For arms entering later, the control data is divided into concurrent and non-concurrent controls (NCC), referring to control patients recruited while the given treatment arm is in the platform and before it enters, respectively. Including NCC can reduce the sample size and increase power, but also lead to bias in the effect estimates, if there are time trends. For platform trials with continuous endpoints without interim analyses, a regression model has been proposed that utilizes NCC and adjusts for time trends by including the factor “period” as a fixed effect. Here, periods are defined as time intervals bounded by any treatment arm entering or leaving the platform. It was shown that this model leads to unbiased effect estimates and asymptotically controls the type I error rate regardless of the time trend pattern, if the time trend affects all arms in the trial equally and is additive on the model scale. However, if interim analyses are included, the definition of the factor periods becomes data dependent and the number of periods to adjust for depends on previous results. Furthermore, due to early stopping the sample sizes in different arms become outcome dependent, and therefore the effect estimates are no longer unbiased. This can affect the adjustment for time trends in the linear model, and the type I error rate might no longer be controlled. In this work, we examine the performance of the currently available model in group-sequential platform trials and show that it leads to a loss of the type I error rate control and bias in the effect estimators. In addition, we describe how the weight of the non-concurrent controls in the treatment effect estimator is stochastically dependent on the outcome in the non-concurrent controls. Moreover, we will investigate adjusted treatment effect estimators that aim to eliminate or reduce the potential bias and resulting type I error rate inflation. Focusing on a simple platform trial with two experimental treatment arms and a continuous endpoint, we will present results from a simulation study, where we evaluate the performance of the considered approaches and compare them to current methods.
Biosketch
I graduated from the University of Vienna with a Bachelor’s degree in Statistics (2020) and a Master’s degree in Data Science (2023). Between 2020 and 2024, I worked at the Center for Medical Data Science at the Medical University of Vienna. Here I was engaged in statistical consulting for medical doctors from the Vienna General Hospital, as well as methodological research on clinical trial designs. In July 2024, I started my PhD studies in Statistics and Operations Research at the Universitat Politècnica de Catalunya (Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)) supported by the Joan Oró Fellowship, under the supervision of Marta Bofill Roig. My PhD project focuses on enhancing the analysis of adaptive platform trials by incorporating non-concurrent controls.
Natalia Pallarés Fontanet
Wave and ceiling of care impact on COVID-19 in-hospital mortality: An inverse probability weighting analysis
Wave and ceiling of care impact on COVID-19 in-hospital mortality: An inverse probability weighting analysis
Background and objective: From March 2020 to July 2022, 6 waves of the COVID-19 pandemic were registered in Spain. There are several studies comparing different COVID-19 waves but, as far as we know, none of them uses a matching procedure to make patients comparable or accounts for ceiling of care. Our aim is to compare in-hospital mortality across waves in patients with and without ceiling of care at hospital admission.
Methods: Data come from an observational study conducted during four waves of COVID-19 (March 2020-August 2021) in 5 hospitals in Catalonia. Three models were constructed to compare in-hospital mortality by wave: 1) a raw logistic model with only wave as a covariate; 2) a fully clinical adjusted logistic regression model with wave and patient baseline information as covariates and 3) a logistic model with weights obtained from an inverse probability weighting procedure to account for differences in baseline profile between waves. Models were presented stratified by ceiling of care. All analyses were conducted using R software version 4.3.0.
Results: A total of 3982 patients without ceiling of care and 1831 patients with ceiling of care were included. Patients with ceiling of care were, in median, 20 years older than patients without ceiling of care and in-hospital mortality ranged from 5\% to 45\%. The adjusted odds ratio (OR) of in-hospital mortality in the second wave were 0.57 (95\%CI 0.40 to 0.80), in the third 0.56 (95\%CI 0.37 to 0.84) and in the fourth 0.34 (95\%CI 0.21 to 0.56) compared with the first wave in subjects without ceiling of care. The adjusted odds ratio were significantly lower in the fourth (0.38 95\%CI 0.25 to 0.58) wave compared to the first wave in subjects with ceiling of care.
Discussion: The likely impact of the wave on in-hospital mortality differs between patients with and without ceiling of care. In patients without ceiling of care, mortality decreased over time which may be explained by better disease knowledge and management. In ceiling of care, only fourth-wave patients were less likely to die than first-wave patients. In a future infectious disease pandemic, it will be a challenge to improve the management of patients with ceiling of care.
Biosketch
Natàlia Pallarès has a degree in Mathematics (Universitat Politècnica de Catalunya (Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)), 2012), a Master’s degree in Statistics and Operations Research (Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)-Universitat de Barcelona (UB), 2014) and is currently enrolled in a PhD programme in Medicine and Translational Research PhD (UB). She has worked as a statistician at the Hospital del Mar Research Institute (IMIM), at the Bellvitge Biomedical Research Institute (IDIBELL) and, since 2023, as a senior statistician at the Biostatistics Unit of the Germans Trias i Pujol Research Institute and Hospital (IGTP). Since 2018, she also works as an associate lecturer at UB.
Andrea Toloba López-Egea
Likelihood-based approach for handling interval-censored covariates in generalized linear models
Likelihood-based approach for handling interval-censored covariates in generalized linear models
The development of methods to address censored covariates has gained signif- icant attention in recent years. Although the problem itself is not new, its presence in real-world data has often been overlooked. Interval-censored covari- ate data, in particular, is frequently replaced by a single imputed value, which is known to introduce bias and underestimate variance. While recent methods have emerged for handling discrete time-to-event covariates, these approaches are often limited to survival analysis contexts, leaving other applications unad- dressed. In this talk, we shift our focus to analytical chemistry, specifically to data re- lated to the quantification of compounds in mixtures. Compounds are often defined by multiple analytes, each measured via liquid chromatography and subject to analyte-specific detection and quantification limits. This chemical technique results in interval-censored data for the overall quantity of a com- pound. Our motivating example originates from metabolomics, exploring the association between circulating carotenoids—molecules present in the blood- stream—and cardiometabolic health. Advancing research in this area requires fitting generalized linear models for cardiometabolic biomarkers while incorpo- rating interval-censored circulating carotenoid levels as a covariate. Building on this example, we present an extension of the GEL algorithm, which was originally developed for time-to-event interval-censored covariates in linear models. The GEL algorithm is an EM-type method that alternates between estimating the distribution of the censored covariate and maximizing the model’s likelihood function. However, like other recent approaches in the literature, it relies heavily on the assumption that the censored covariate has a discrete support, which limits its applicability. Our extension overcomes this limitation by handling interval-censored covariates nonparametrically and regardless of the distribution’s support, broadening its usability to a wide range of applications.
Biosketch
Andrea Toloba is graduated in Mathematics at the University of Barcelona (2019) and pursued the Master's degree in Statistics and Operations Research (MESIO Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)-UB). During her MSc studies, she worked as statistician in the Epidemiology and public health Programme for Cardiovascular diseases at the Hospital del Mar Medical Research Institute (IMIM). Since September 2022, she is a PhD Student in the Statistics and Operations Research Programme at the Polytechnic University of Catalonia (Universitat Politècnica de Catalunya-BarcelonaTECH (UPC)), under the supervision of Prof. Guadalupe Gómez Melis and Prof. Klaus Langohr. Her research interests are in survival analysis and interval censoring, in particular her PhD project focuses on regression models with interval-censored covariates.
The workshop will take place at the FME (Facultat de Matemàtiques i Estadística) Conference Room of the UPC (Sala d'actes FME de la Universitat Politècnica de Catalunya).
The organising committee recommends some accommodation options close to the conference venue. Please click below.
Resa Lesseps Residence Hall is located in the popular district of Gracia, a charming spot!
In addition, you will be close to the faculties of the Universitat Politècnica de Catalunya (UPC) and the Universitat Ramón Llull (URL), ideal to save transit time!
Our residence is strategically located within Campus Nord, so you will be very close to most UB and UPC faculties. We are also a fantastic option for students from other universities and business schools such as IESE, ESADE, IQS, and many more.
A comfortable and convenient way to make the most of your time at university!
In addition, you will be in the exclusive Pedralbes district, which will add a special flavour to your student experience.
In the district ofLes Corts, near Camp Nou, we find the Arenas Atiram Hotel. Located 10 minutes from the city centre and 5 minutes from the shopping and leisure area of Diagonal, an area known for its various academic and medical institutions.
Close to theuniversity area(UB, IESE, ESADE, UPC) and the main clinics and hospitals in Barcelona (Hospital de Barcelona, CIMA Clinic, Ophthalmological Institutes, Corachan Clinic, DEXEUS, Chiari Institute, IVI Clinic, Institut Marqués…).
There are many buildings near Hotel Upper Diagonal that may be of interest to you: Palau de Congressos de Catalunya, Hospital Cima, Hospital de Barcelona and Clínica Dexeus. You are also within walking distance of the University of Barcelona and Campus Nord of the Polytechnic University of Catalonia (UPC).
We are pleased to announce that a Certificate of Attendance will be provided to each participant who attended the workshop. Expect to receive your certificate in the coming days after the event.
Share: