# R for Researchers: Regression (OLS) solutions

#### April 2015

This article contains solutions to exercises for an article in the series R for Researchers. For a list of topics covered by this series, see the Introduction article. If you're new to R we highly recommend reading the articles in order.

There is often more than one approach to the exercises. Do not be concerned if your approach is different than the solution provided.

These solutions require the solutions from the prior lesson be run in your R session.

#### Exercise solutions

These exercises use the alfalfa dataset and the work you started on the alfAnalysis script. Open the script and run all the commands in the script to prepare your session for these problems.

Note, we will use the shade and irrig variable as continuous variables for these exercise. They could also be considered as factor variables. Since both represent increasing levels we first try to use them as scale.

1. Set the the reference level of the inoc variable to cntrl.

#######################################################
#######################################################
##
##   Regression
##
#######################################################
#######################################################

str(alfalfa$inoc)  Factor w/ 5 levels "A","B","C","cntrl",..: 1 2 5 3 4 5 4 2 1 3 ... alfalfa$inoc <- factor(alfalfa$inoc,levels=c("cntrl","A","B","C","D") ) 2. Create a quadratic poly term for the shade variable. shade2 <- poly(alfalfa$shade, degree=2)
3. Regress yield on the irrig, inoc, the quadratic shade term, and all their interactions.

out <- lm(yield~(irrig+inoc+shade2)^2, data=alfalfa)
summary(out)

Call:
lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa)

Residuals:
1          2          3          4          5          6          7
-1.403e-02  2.053e-02 -2.149e-01 -1.712e-01  1.621e-01  6.807e-01 -3.241e-01
8          9         10         11         12         13         14
2.053e-02  5.610e-02  3.044e-01  1.141e-01 -7.523e-01 -8.415e-02  5.321e-16
15         16         17         18         19         20         21
-1.847e-01  3.241e-01  5.610e-02 -4.565e-01  2.258e-01  3.224e-01 -8.210e-02
22         23         24         25
2.092e-01 -1.621e-01 -3.582e-02 -1.403e-02

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    32.4643     1.3937  23.294 0.000173 ***
irrig          -1.1195     0.4020  -2.785 0.068729 .
inocA           4.3853     1.9397   2.261 0.108848
inocB          -1.1339     2.5535  -0.444 0.687085
inocC           1.5821     1.5433   1.025 0.380738
inocD           4.6810     1.6295   2.873 0.063909 .
shade21         4.0046     5.0646   0.791 0.486852
shade22        -8.5243     7.6699  -1.111 0.347454
irrig:inocA     0.6117     0.5686   1.076 0.360816
irrig:inocB     2.6848     0.8416   3.190 0.049701 *
irrig:inocC     1.7532     0.5001   3.505 0.039332 *
irrig:inocD     0.1157     0.4993   0.232 0.831676
irrig:shade21   2.5552     1.2161   2.101 0.126428
irrig:shade22   3.4764     1.8453   1.884 0.156083
inocA:shade21  -9.3599     5.2525  -1.782 0.172771
inocB:shade21  -1.4753     3.4398  -0.429 0.696927
inocC:shade21   4.1493     3.2650   1.271 0.293373
inocD:shade21  -0.5848     4.5746  -0.128 0.906373
inocA:shade22  -8.8399     4.1364  -2.137 0.122187
inocB:shade22   7.3414     7.2192   1.017 0.384063
inocC:shade22   0.8405     3.5126   0.239 0.826294
inocD:shade22  -3.5093     3.1239  -1.123 0.343060
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8035 on 3 degrees of freedom
Multiple R-squared:  0.9935,    Adjusted R-squared:  0.9478
F-statistic: 21.74 on 21 and 3 DF,  p-value: 0.01347
4. Use the backward selection method to reduce the model. Use the significance of the term as the criteria, as was done in the lesson.

There are two methods provided in this solution.

step(out, test="F")
Start:  AIC=-19.94
yield ~ (irrig + inoc + shade2)^2

Df Sum of Sq     RSS      AIC F value  Pr(>F)
<none>                       1.9370 -19.9437
- irrig:shade2  2    4.9536  6.8906   7.7819  3.8361 0.14904
- inoc:shade2   8   17.6767 19.6137  21.9338  3.4222 0.16995
- irrig:inoc    4   15.2447 17.1817  26.6242  5.9028 0.08823 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Call:
lm(formula = yield ~ (irrig + inoc + shade2)^2, data = alfalfa)

Coefficients:
(Intercept)          irrig          inocA          inocB          inocC
32.4643        -1.1195         4.3853        -1.1339         1.5821
4.6810         4.0046        -8.5243         0.6117         2.6848
1.7532         0.1157         2.5552         3.4764        -9.3599
-1.4753         4.1493        -0.5848        -8.8399         7.3414
0.8405        -3.5093  
out2 <- lm(yield~irrig+inoc+shade2+irrig:inoc+irrig:shade2,
data=alfalfa)
drop1(out2, test="F")
Single term deletions

Model:
yield ~ irrig + inoc + shade2 + irrig:inoc + irrig:shade2
Df Sum of Sq    RSS    AIC F value Pr(>F)
<none>                    19.614 21.934
irrig:inoc    4   16.5458 36.159 29.227  2.3199 0.1216
irrig:shade2  2    1.6273 21.241 19.926  0.4563 0.6451
out3 <- lm(yield~irrig+inoc+shade2+irrig:inoc, data=alfalfa)
drop1(out3, test="F")
Single term deletions

Model:
yield ~ irrig + inoc + shade2 + irrig:inoc
Df Sum of Sq    RSS    AIC F value    Pr(>F)
<none>                  21.241 19.926
shade2      2    63.341 84.582 50.471 19.3832 0.0001257 ***
irrig:inoc  4    20.399 41.639 28.754  3.1211 0.0526577 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
out4 <- lm(yield~irrig+inoc+shade2, data=alfalfa)
drop1(out4, test="F")
Single term deletions

Model:
yield ~ irrig + inoc + shade2
Df Sum of Sq     RSS    AIC F value    Pr(>F)
<none>               41.639 28.754
irrig   1    14.797  56.436 34.356   6.041   0.02501 *
inoc    4   155.894 197.534 59.676  15.912 1.380e-05 ***
shade2  2    84.328 125.967 52.429  17.214 8.196e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
out5 <- lm(yield~irrig+inoc+shade, data=alfalfa)
drop1(out5, test="F")
Single term deletions

Model:
yield ~ irrig + inoc + shade
Df Sum of Sq     RSS    AIC F value    Pr(>F)
<none>               45.576 29.013
irrig   1    14.797  60.373 34.042  5.8439   0.02646 *
inoc    4   155.894 201.470 58.169 15.3924 1.236e-05 ***
shade   1    80.391 125.967 52.429 31.7501 2.402e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5. Commit your changes to AlfAnalysis.

There is no code associated with the solution to this problem.