Skip to content

You are here: Home / Seminars / Upcoming GRBIO seminars

Upcoming GRBIO seminars

Organizers: Jordi Cortés and Ferran Reverter

Lloc: online. Per assistir-hi, contacta amb .

Títol. To be announced.

Abstract. To be announced.

 

Lloc: online. Per assistir-hi, contacta amb .

Dealing with sampling weights on the development of prediction models for complex survey data

Survey data are becoming increasingly well known among researchers from different fields, including but not limited to social and health sciences. One of the characteristics of this kind of data comparing to simple random samples, are the sampling weights which indicate the number of units that each sampled observation represents in the population. Complex survey data are usually used, among other purposes, to develop prediction models. However, the effect that sampling weights may have in the modelling process should be carefully checked. The goals of this talk are twofold. In the first place, to present the impact that sampling weights have in the estimation process of a prediction model and in the second place to present an empirical AUC estimation proposal which considers the sampling weights. Results obtained based on a simulation will be shown.

 

Lloc: online. Per assistir-hi, contacta amb .

CLUB DE LECTURA: Empirical Bayes

The constraints of slow mechanical computation molded classical statistics into a mathematically ingenious theory of sharply delimited scope. Emerging after the Second World War, electronic computation loosened the computational stranglehold, allowing a more expansive and useful statistical methodology. Some revolutions start slowly. The journals of the 1950s continued to emphasize classical themes: pure mathematical development typically centered around the normal distribution. Change came gradually, but by the 1990s a new statistical technology, computer enabled, was firmly in place. Key developments from this period are described in the next several chapters. The ideas, for the most part, would not startle a pre-war statistician, but their computational demands, factors of 100 or 1000 times those of classical methods, would. More factors of a thousand lay ahead, as will be told in Part III, the story of statistics in the twenty-first century. Empirical Bayes methodology, this chapter’s topic, has been a particularly slow developer despite an early start in the 1940s. The roadblock here was not so much the computational demands of the theory as a lack of appropriate data sets. Modern scientific equipment now provides ample grist for the empirical Bayes mill


El Club de Lectura està basat en el llibre Computer Age Statistical Inference del Bradley Efron i Trevor Hastie. Si voleu saber com accedir a aquest llibre, escriviu a

Lloc: online. Per assistir-hi, contacta amb .

Statistical education in data visualization? Some applied examples of Sports Analytics

Sports Analytics has grown exponentially thanks to the IT sciences. It often includes data visualization as well as statistics with a focus that is more tactical and sports performance related. A statistical graph can offer a compelling approach to alternative statistical thinking to one that focuses on procedural formulas. In sports analysis, the exploration and descriptive analysis of data using visualization techniques have increased in recent years to describe, for example, possible patterns and uncertainty in/of player performance. The abuse of graphics, and their frequent misinterpretation in the world of sports has led us to create more informative an accurate visualization. In this talk, we will explain new, more educational visualizations and illustrate their role with several practical examples in soccer and basketball.

 

Lloc: online. Per assistir-hi, contacta amb .

CLUB DE LECTURA: Support-Vector Machines and Kernel Methods

While linear logistic regression has been the mainstay in biostatistics and epidemiology, it has had a mixed reception in the machine-learning community. There the goal is often classification accuracy, rather than statistical inference. Logistic regression builds a classifier in two steps: fit a conditional probability model for Pr(Y=1|X=x), and then classify as a one if the predicted Pr(Y=1|X=x)>0.5. SVMs bypass the first step, and build a classifier directly. Another rather awkward issue with logistic regression is that it fails if the training data are linearly separable! What this means is that, in the feature space, one can separate the two classes by a linear boundary. In cases such as this, maximum likelihood fails and some parameters march off to infinity. While this might have seemed an unlikely scenario to the early users of logistic regression, it becomes almost a certainty with modern wide genomics data. When p>>n (more features than observations), we can typically always find a separating hyperplane. Finding an optimal separating hyperplane was in fact the launching point for SVMs. As we will see, they have more than this to offer, and in fact live comfortably alongside logistic regression. SVMs pursued an age-old approach in statistics, of enriching the feature space through nonlinear transformations and basis expansions; a classical example being augmenting a linear regression with interaction terms. A linear model in the enlarged space leads to a nonlinear model in the ambient space. This is typically achieved via the “kernel trick,” which allows the computations to be performed in the n-dimensional space for an arbitrary number of predictors p. As the field matured, it became clear that in fact this kernel trick amounted to estimation in a reproducing-kernel Hilbert space.


El Club de Lectura està basat en el llibre Computer Age Statistical Inference del Bradley Efron i Trevor Hastie. Si voleu saber com accedir a aquest llibre, escriviu a

 

Lloc: online. Per assistir-hi, contacta amb .

Generalized Linear Models and Regression Trees

Indirect evidence is not the sole property of Bayesians. Regression models are the frequentist method of choice for incorporating the experience of others. Increasingly aggressive use of regression techniques is a hallmark of modern statistical practice, “aggressive” applying to the number and type of predictor variables, the coinage of new methodology, and the sheer size of the target data sets. Generalized linear models, this chapter’s main topic, have been the most pervasively influential of the new methods.