Statistics 2B: Linear statistical models Original by Simon Wood, Modifications by Julian Faraway January 2013 Contents 1 Preliminaries 2 1.1 How to use the notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Stuff you are assumed to know already . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Introduction: A simple linear model 3 2.1 A simple linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Simple least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Sampling properties of βˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 So how old is the universe? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Adding a distributional assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 Testing hypotheses about β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Linear models in general 10 3.1 Notation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 R and linear models in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 A quadratic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 A model with factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Matrix algebra: revision 14 4.1 Matrices and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Partitioning matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 Euclidean norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.6 Matrix rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.7 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.8 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.9 QR decomposition of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Expectation, covariance the multivariate normal and relatives 16 5.1 Expectation of linear transformations of random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 Covariance matrices and linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3 The multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3.1 Linear transformations of normal random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3.2 Covariance and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.3.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.4 Distributions related to the normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 The theory of linear models 20 6.1 The geometry of linear modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.2 Least squares estimation of β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.3 The distribution of βˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.4 Testing a single parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.5 F-ratio results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.6 The influence matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.7 The residuals, ˆ, and fitted values, µˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.8 Results in terms of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1 7 Linear models in R 24 7.1 Galapagos species diversity data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.1.1 Estimating the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.1.2 Confidence intervals and hypothesis tests for single parameters . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.1.3 Testing several parameters using F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.1.4 Checking the model assumptions: residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 7.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 7.2.1 AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7.2.2 step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7.3 R 2 : How close is the fit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 7.4.1 What can go wrong with predictions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.5 Orthogonality and Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 7.6 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.7 How to approach an analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 8 Modelling with factor variables 41 8.1 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.2 Multiple factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.3 ‘Interactions’ of factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 8.4 Factor by continuous interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 8.5 Using factor variables in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 8.5.1 Factor interactions in model formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.5.2 A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.6 The warpbreak data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 8.6.1 Follow up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 8.7 ANOVA tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 8.7.1 Example of erroneous dropping of a main effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 8.7.2 drop1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 8.8 An agricultural field trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 1 Preliminaries This course is different to the Maths and Statistics courses that you have taken so far. As well as being about statistical theory, it is about how you can use that theory to extract information from data, in practice. The linear statistical models covered here, and their extensions, are routinely used in industry, commerce and science, whether it’s for analysing the results of medical drug trials, predicting the movements of asset prices, seeking associations between genetic markers and disease, analysing internet traffic data, predicting supermarket customer preferences or a host of other applications. Of all the subjects we teach you here, this is the one that you are most likely to use directly in a future job. As you would expect from such a widely used set of statistical tools, you can’t possibly learn all there is to know about linear models in one lecture course, but you already know enough statistics and linear algebra to derive the core theory on which all linear modelling is based, properly. Since all real world application of statistical modelling is done by computer, this course will also cover practical modelling using the computer language R. Apart from anything else, the ability to compute with these models is valued by employers. 1.1 How to use the notes These notes are intended to help you produce a set of notes for the course. However, they should not be viewed as a complete set of notes until you have annotated them yourself in lectures. The idea is that the notes free you from having to copy down reams of text, and make it less likely that your notes will contain confusing mathematical errors. Instead they free you to focus on the substance of what is presented in the lectures. The lectures will concentrate on explaining the mathematical, statistical or data analytic ideas that the notes cover, sometimes with chalk on a blackboard, sometimes with computer demonstrations of how to do the statistical modelling tasks discussed. The ideal approach, as with all university lecture courses, is to try to get a good overview of the topic discussed in the lecture itself, and then as soon as possible there-after to read through the detailed notes to consolidate and ‘fill in the details’. The aim of the lectures is not to reproduce the notes exactly, but to take you through the material in a way that maximizes your eventual understanding of the subject. So, try and follow the ideas presented in the lecture, adding comments to the notes to help your later reading of them. The course problem sheets all relate to the notes: so if you are stuck, look in the notes first. 2 When reading the notes, bear in mind that different sections should be read at very different speeds. For example, the section on the general theory of linear models is compact, but mathematically quite dense. As such it is a very slow read: to understand it you have to go through it methodically and carefully, and it’s important to understand it! Other sections on practical use of linear models are wordy with lots of output to interpret, and a different, quicker, reading approach is appropriate. 1.2 Stuff you are assumed to know already This course builds on previous statistics units that you have taken. In particular it assumes that you are familiar with the key ideas behind parametric statistical inference. • Statistics is about using data to extract meaningful information about the system generating the data, when the data generating process is such that the data will vary randomly from one replication of the data generating process to the next, even if the underlying system stays the same. • A statistical model is a simplified mathematical representation of the data generating process (a sort of mathematical cartoon of the system that we want to learn about). It will usually depend on some known things, such as other variables we have measured alongside the data, and some unknown parameters: β, say. • A key point about a statistical model is that if we knew the values of β, then the model should be able to simulate data that ‘looked like’ the real data. In principle, given values for the parameters, a statistical model also allows us to assign relative probabilities to observing one data set, as opposed to another. • Once we have a model, the major goal of statistics is then to make inferences about β from the data. There are 4 questions: 1. What value of β is most consistent with the data? 2. What range of values of β is consistent with the data 3. Is some pre-specified value of, or restriction on, β consistent with the data? 4. Are there any values of β at all for which the model is consistent with the data? • The answers to these questions are provided by 1. Parameter estimation, especially maximum likelihood estimation, which seeks to find the βˆ making the observed data as probable as possible, according to the model. 2. Confidence interval estimation, which uses the data to compute intervals with a specified probability of including the true parameter values, over replication of the data generating process . 3. Hypothesis testing, which seeks to assess the plausibility of some hypothesis, by computing a measure of how improbable the data are under that hypothesis: the p-value (see later). A low p-values suggests that the data would be improbable if the hypothesis were true, so it’s probably false. 4. Model checking: could the model have produced the data at all? If it could not then none of the theory in 1-3 is of any use at all. Useful checking is often done graphically. In terms of technicalities, the course particularly assumes that you are familiar with random variables, p.d.f.s, c.d.f.s and inverse c.d.f.s (quantile functions). It also assumes that if you are told βˆ ∼ N(β,σ 2 β ) or (βˆ − β)/σˆ βˆ ∼ td (plus any values needed), then you could test hypotheses about β and find confidence intervals for β (in your sleep, after a heavy night out, with one hand tied behind your back etc). A solid grounding in matrix algebra is also assumed, although the key results are briefly revised in section 4. 2 Introduction: A simple linear model How old is the universe? The standard big-bang model of the origin of the universe says that it expands uniformly, and locally, according to Hubble’s law, y = βx, where y is the relative velocity of any two galaxies separated by distance x, and β is “Hubble’s constant” (in standard astrophysical notation y ≡ v , x ≡ d and β ≡ H0). β −1 gives the approximate age of the universe, but β is unknown and must somehow be estimated from observations of y and x, made for a variety of galaxies at different distances from us. Figure 1 plots velocity against distance for 24 galaxies, according to measurements made using the Hubble Space Telescope. Velocities are assessed by measuring the Doppler effect red shift in the spectrum of light observed from the galaxies concerned, although some correction for ‘local’ velocity components is required. Distance measurement

No comments found.

jordancarter 7 months ago

This study guide is clear, well-organized, and covers all the essential topics. The explanations are concise, making complex concepts easier to understand. It could benefit from more practice questions, but overall, it's a great resource for efficient studying. Highly recommend!

A. You will receive a PDF that is available for instant download upon purchase. The document will be accessible to you at any time, from anywhere, and will remain available indefinitely through your profile.

A. Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

A. you are buying this document from us learnexams

A. No, you only buy these notes for $ indicated . You are not obligated to anything after your purchase.

A. check our reviews at trustpilot

Add To Cart

Buy Now

Category	exam bundles
Comments	0
Rating
Sales	0

Buy Our Plan

We have

The latest updated Study Material Bundle with 100% Satisfaction guarantee

Visit Now

jordancarter 7 months ago

Buy Our Plan

We have

Share on