Statistics 2B: Linear statistical models
Original by Simon Wood,
Modifications by Julian Faraway
January 2013
Contents
1 Preliminaries 2
1.1 How to use the notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Stuff you are assumed to know already . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Introduction: A simple linear model 3
2.1 A simple linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Simple least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Sampling properties of βˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 So how old is the universe? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Adding a distributional assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Testing hypotheses about β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Linear models in general 10
3.1 Notation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 R and linear models in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 A quadratic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 A model with factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Matrix algebra: revision 14
4.1 Matrices and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Partitioning matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Euclidean norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6 Matrix rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.7 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.8 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.9 QR decomposition of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Expectation, covariance the multivariate normal and relatives 16
5.1 Expectation of linear transformations of random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Covariance matrices and linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 The multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.1 Linear transformations of normal random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3.2 Covariance and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 Distributions related to the normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 The theory of linear models 20
6.1 The geometry of linear modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Least squares estimation of β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 The distribution of βˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 Testing a single parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.5 F-ratio results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.6 The influence matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.7 The residuals, ˆ, and fitted values, µˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.8 Results in terms of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1
7 Linear models in R 24
7.1 Galapagos species diversity data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1.1 Estimating the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1.2 Confidence intervals and hypothesis tests for single parameters . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1.3 Testing several parameters using F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.1.4 Checking the model assumptions: residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.2.1 AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.2.2 step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 R
2
: How close is the fit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.4.1 What can go wrong with predictions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.5 Orthogonality and Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.6 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.7 How to approach an analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8 Modelling with factor variables 41
8.1 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Multiple factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 ‘Interactions’ of factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.4 Factor by continuous interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.5 Using factor variables in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.5.1 Factor interactions in model formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.5.2 A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.6 The warpbreak data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.6.1 Follow up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.7 ANOVA tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.7.1 Example of erroneous dropping of a main effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.7.2 drop1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.8 An agricultural field trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1 Preliminaries
This course is different to the Maths and Statistics courses that you have taken so far. As well as being about statistical
theory, it is about how you can use that theory to extract information from data, in practice. The linear statistical
models covered here, and their extensions, are routinely used in industry, commerce and science, whether it’s for
analysing the results of medical drug trials, predicting the movements of asset prices, seeking associations between
genetic markers and disease, analysing internet traffic data, predicting supermarket customer preferences or a host
of other applications. Of all the subjects we teach you here, this is the one that you are most likely to use directly in a
future job.
As you would expect from such a widely used set of statistical tools, you can’t possibly learn all there is to know
about linear models in one lecture course, but you already know enough statistics and linear algebra to derive the core
theory on which all linear modelling is based, properly. Since all real world application of statistical modelling is done
by computer, this course will also cover practical modelling using the computer language R. Apart from anything else,
the ability to compute with these models is valued by employers.
1.1 How to use the notes
These notes are intended to help you produce a set of notes for the course. However, they should not be viewed as
a complete set of notes until you have annotated them yourself in lectures. The idea is that the notes free you from
having to copy down reams of text, and make it less likely that your notes will contain confusing mathematical errors.
Instead they free you to focus on the substance of what is presented in the lectures.
The lectures will concentrate on explaining the mathematical, statistical or data analytic ideas that the notes cover,
sometimes with chalk on a blackboard, sometimes with computer demonstrations of how to do the statistical modelling tasks discussed. The ideal approach, as with all university lecture courses, is to try to get a good overview of
the topic discussed in the lecture itself, and then as soon as possible there-after to read through the detailed notes
to consolidate and ‘fill in the details’. The aim of the lectures is not to reproduce the notes exactly, but to take you
through the material in a way that maximizes your eventual understanding of the subject.
So, try and follow the ideas presented in the lecture, adding comments to the notes to help your later reading of
them. The course problem sheets all relate to the notes: so if you are stuck, look in the notes first.
2
When reading the notes, bear in mind that different sections should be read at very different speeds. For example,
the section on the general theory of linear models is compact, but mathematically quite dense. As such it is a very
slow read: to understand it you have to go through it methodically and carefully, and it’s important to understand it!
Other sections on practical use of linear models are wordy with lots of output to interpret, and a different, quicker,
reading approach is appropriate.
1.2 Stuff you are assumed to know already
This course builds on previous statistics units that you have taken. In particular it assumes that you are familiar with
the key ideas behind parametric statistical inference.
• Statistics is about using data to extract meaningful information about the system generating the data, when the
data generating process is such that the data will vary randomly from one replication of the data generating
process to the next, even if the underlying system stays the same.
• A statistical model is a simplified mathematical representation of the data generating process (a sort of mathematical cartoon of the system that we want to learn about). It will usually depend on some known things, such
as other variables we have measured alongside the data, and some unknown parameters: β, say.
• A key point about a statistical model is that if we knew the values of β, then the model should be able to simulate
data that ‘looked like’ the real data. In principle, given values for the parameters, a statistical model also allows
us to assign relative probabilities to observing one data set, as opposed to another.
• Once we have a model, the major goal of statistics is then to make inferences about β from the data. There are 4
questions:
1. What value of β is most consistent with the data?
2. What range of values of β is consistent with the data
3. Is some pre-specified value of, or restriction on, β consistent with the data?
4. Are there any values of β at all for which the model is consistent with the data?
• The answers to these questions are provided by
1. Parameter estimation, especially maximum likelihood estimation, which seeks to find the βˆ making the
observed data as probable as possible, according to the model.
2. Confidence interval estimation, which uses the data to compute intervals with a specified probability of
including the true parameter values, over replication of the data generating process .
3. Hypothesis testing, which seeks to assess the plausibility of some hypothesis, by computing a measure of
how improbable the data are under that hypothesis: the p-value (see later). A low p-values suggests that
the data would be improbable if the hypothesis were true, so it’s probably false.
4. Model checking: could the model have produced the data at all? If it could not then none of the theory in
1-3 is of any use at all. Useful checking is often done graphically.
In terms of technicalities, the course particularly assumes that you are familiar with random variables, p.d.f.s, c.d.f.s
and inverse c.d.f.s (quantile functions). It also assumes that if you are told βˆ ∼ N(β,σ
2
β
) or (βˆ − β)/σˆ βˆ ∼ td (plus
any values needed), then you could test hypotheses about β and find confidence intervals for β (in your sleep, after
a heavy night out, with one hand tied behind your back etc). A solid grounding in matrix algebra is also assumed,
although the key results are briefly revised in section 4.
2 Introduction: A simple linear model
How old is the universe? The standard big-bang model of the origin of the universe says that it expands uniformly,
and locally, according to Hubble’s law,
y = βx,
where y is the relative velocity of any two galaxies separated by distance x, and β is “Hubble’s constant” (in standard
astrophysical notation y ≡ v , x ≡ d and β ≡ H0). β
−1 gives the approximate age of the universe, but β is unknown
and must somehow be estimated from observations of y and x, made for a variety of galaxies at different distances
from us.
Figure 1 plots velocity against distance for 24 galaxies, according to measurements made using the Hubble Space
Telescope. Velocities are assessed by measuring the Doppler effect red shift in the spectrum of light observed from
the galaxies concerned, although some correction for ‘local’ velocity components is required. Distance measurement
Category | exam bundles |
Comments | 0 |
Rating | |
Sales | 0 |