lm()
(as before)lm()
Simulate data, independent variables:
x1 = runif(10, 0, 100)
x2 = runif(10, 10, 200)
x3 = runif(10, 100, 400)
cor.test(x1,x2)$p.value; cor.test(x1,x3)$p.value; cor.test(x2,x3)$p.value
## [1] 0.2151276
## [1] 0.7329013
## [1] 0.3232681
The \(x_i\) must not be (strongly) correlated!
Let’s simulate some reponse:
y = 3 + 2*x1 + 3*x2 + 1*x3 + rnorm(10, 0, 2) # simulated response
Let’s see if we can “rediscover” the true coefficients above:
mdata = data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)
head(mdata)
## y x1 x2 x3
## 1 557.7894 75.9815207 51.58790 246.8139
## 2 720.2404 97.3230971 68.20277 319.2263
## 3 597.0158 66.6150526 55.27691 293.9997
## 4 600.0651 0.8985721 114.35953 251.0833
## 5 562.8531 26.4637249 113.62655 167.9717
## 6 331.0584 30.2783098 52.04472 109.3102
lm.res = lm(y ~ x1 + x2 + x3, data = mdata) # Additive model, Wilkinson-Rogers notation
summary(lm.res)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = mdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5867 -0.9367 0.2019 0.7856 2.2606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.356328 2.434827 2.2 0.0701 .
## x1 1.983107 0.019371 102.4 5.85e-11 ***
## x2 2.989270 0.020393 146.6 6.80e-12 ***
## x3 0.999827 0.008325 120.1 2.25e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.737 on 6 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.272e+04 on 3 and 6 DF, p-value: 1.491e-12
R squared (coefficient of determination) describes how good the fit is. A value close to 1 is good.
The p-value for the F statistic should be small. That means that the fit makes sense, i.e. that the variation by regression is much higher than the variation by (random) error