Linear regression - course 4900 rub. from Open education, training 5 weeks, about 2 hours per week, Date November 29, 2023.
Miscellaneous / / November 29, 2023
If correlation analysis makes it possible to quantify the strength and direction of the relationship between two quantities, then the construction of regression models provides greater opportunities. Using regression analysis, it is possible to quantitatively describe the behavior of the studied quantities depending on predictor variables and obtain predictions on new data. You will learn how to build simple and multiple linear models using the R language. Each method has its limitations, so we will help you understand in what situations linear regression can and cannot be used, and we will teach you methods for diagnosing selected models. A special place in the course is given to the in-depth anatomy of regression analysis: you will master operations with matrices that are the basis of linear regression in order to be able to understand more complex varieties of linear models.
If you are faced with the need to search and describe the relationships between certain phenomena that can be measured quantitatively, then this course is a good opportunity to understand how simple and multiple linear regression works, learn about the possibilities and limitations of these methods.
The course is designed for those who are already familiar with the basic techniques of data analysis using the R language and with the creation of simple .html documents using rmarkdown and knitr.
Scientific interests: structure and dynamics of marine benthos communities, spatial scales, succession, interspecific and intraspecific biotic interactions, growth and reproduction of marine invertebrates, demographic structure of populations, microevolution, biostatistics.
The course consists of 5 modules:
1. Correlation analysis. Simple Linear Regression
We will begin our conversation about methods for numerically describing relationships between quantitative quantities with covariance and correlation coefficients, which allow us to estimate the strength and direction of the relationship. Then you will learn what additional information about relationships can be obtained by constructing a linear model of the relationship between quantities. You'll learn to interpret regression coefficients and learn when and how linear models can be used to make predictions on new data. By the end of this module, you will learn how to fit a linear model equation and plot it with a confidence region.
2. Testing the significance and validity of linear models
Building a linear model and writing down its equation is only the very beginning of the analysis. In this module, you will learn how to describe the results of regression analysis: how to test the statistical significance of the overall model or its coefficients, and assess the quality of the fit. Linear models (or rather, the statistical tests that are used for them), like any method, have their limitations. You will learn what these limitations are and where they come from. The graphical diagnostic methods that we will use are universal for different linear models - more practice will help you make decisions more confidently. Once you understand all this, you can write a complete script in R to fit, diagnose, and present the results of a simple linear regression.
3. A Brief Introduction to the World of Linear Algebra
In this module, we will dive into the heart of linear models. To do this, you will have to learn or remember the basics of linear algebra. We'll discuss the different types of matrices, how to create them in R, and basic operations with them. We will need all this to understand how linear regression works from the inside. You will learn what a model matrix is, learn how to write a linear regression equation in the form of matrices and find its coefficients. You will see with your own eyes the hat matrix, which allows you to obtain predicted values, and you will even be able to calculate it manually. Finally, you will learn to calculate the residual variance, variance-covariance matrix, and use all this to build a regression confidence zone. Then this knowledge will help you understand the structure of more complex models: with discrete predictors, with different distributions of residuals, with a different structure of the variation-covariance matrix.
4. Multiple Linear Regression
Most often, the relationships between quantities are more complex than can be described using simple linear regression. Multiple linear regression is used to describe how a response variable depends on multiple predictors. With the appearance of multiple predictors in the model, linear regression has a new condition of applicability - the requirement of the absence of multicollinearity. In this module, you will learn how to identify and avoid multicollinearity. Finally, there are often more variables in multiple models than can be depicted on a plane, That's why we'll teach you simple techniques that will help you create informative graphics even in this case.
5. Comparison of linear models
Multiple linear models are like a construction set: more complex models can be taken apart and simplified. You will learn how nested model comparisons using the partial F test are used to test the significance of individual predictors or groups of predictors. More complex models better describe the original data, but excessive complication is dangerous, because such models begin to make poor predictions on new data. Using partial F tests, you can simplify models by gradually eliminating non-significant predictors. Simplified models are easier to use to interpret and present results. Everything you've learned so far about linear regression can be applied by completing a data analysis project where you need to correctly build an optimal multiple linear model and present its results in a report written using rmarkdown and knitr.