Generalized linear models - course 3600 rub. from Open education, training 3 weeks, about 6 hours per week, Date November 29, 2023.
Miscellaneous / / December 01, 2023
One of the conditions for the applicability of conventional linear models is the independence of observations from each other, on the basis of which the model is selected. However, in practice there are often situations where the design of material collection is such that violation of this condition is inevitable. Imagine that you decided to build a model that describes the relationship between physical education performance and IQ test scores among students. To solve this problem, you made numerous samples at several institutions. Is it possible to combine such data into one analysis, built according to the traditional scheme? Of course not. Students at each university may be similar to each other in some ways. Even the nature of the relationship between the quantities being studied may be somewhat different. This type of data, in which there are intragroup correlations, should be analyzed using linear mixed models. We will show that some predictors should be included in the model as so-called “random factors”. You will learn that random factors can be hierarchically subordinated. We will discuss how such mixed models can be built for dependent variables that follow different types of distributions. In addition, we will show that the random part of the model can be even more complex - it can have a component that models the behavior of the variance in response to the influence of a covariate. At the end of the course, you will find a project in which you can practice building mixed models by choosing one of several datasets. Based on the analysis of this data, you can create a report in the tradition of reproducible research.
Associate Professor, Department of Invertebrate Zoology, Faculty of Biology, St. Petersburg State University, Ph.D.
Scientific interests: structure and dynamics of marine benthos communities, spatial scales, succession, interspecific and intraspecific biotic interactions, growth and reproduction of marine invertebrates, demographic structure of populations, microevolution, biostatistics.
The course consists of 4 modules:
1) Introduction to generalized linear models
Generalized linear models (GLMs) allow you to model the behavior of quantities that do not follow a normal distribution. To make your first steps in the world of GLM easier, we will analyze their structure using the example of GLM for normally distributed quantities - this way you can draw parallels with simple linear models. You will learn what a link function is, how maximum likelihood works, and how to test GLM hypotheses using Wald tests and likelihood ratio tests.
2) Model selection problem
In this module we will talk about methodological issues associated with building models. A model is a simplified representation of reality, and choosing between different competing methods of such simplification is a frequent task for the analyst. In this module, you will learn to compare models using information criteria. We will discuss the main options for the analysis when choosing models and talk about the difficulties arising in connection with the hidden multiplicity of models. Finally, we will teach you to recognize the main types of model selection abuses (data-fishing, p-hacking).
3) Generalized linear models for counting data
In this module we will discuss basic methods for modeling countable quantities. First, we will discuss why conventional linear models are not suitable for counting data. The properties of countable distributions will help you understand the differences between the types of GLM for countable data and the features of their diagnostics. You'll see the link function at work when you visualize GLM predictions at the link function scale and at the response variable scale.
4) Generalized linear models with binary response
Sometimes there is a need to simulate whether some event has occurred or not, whether the football team or lost, whether the patient recovered after treatment or not, whether the client committed purchase or not. Conventional linear models are not suitable for modeling such binary data (events with two outcomes), but this can be easily done using generalized linear models. In this module, you will learn to model the probabilities of events occurring by representing them as odds. We will look at how the logit link function works and how GLM coefficients are interpreted when it is used. Finally, you will be able to practice analyzing generalized linear models with different distributions by completing a data analysis project. The results of this analysis will need to be presented as a report in html format, written using rmarkdown/knitr.
• Learn what skills are needed to get started in analytics and Data Science• Learn to use Excel, SQL, Power BI, Google Data Studio to work with data and write your first code in Python• Get a step-by-step guide and learn how to enter the data science field and choose a role in Data Science
4,4
1 490 ₽