---
title: 'R for Researchers: Diagnostic solutions'
date: "April 2015"
---
This article contains solutions to exercises for an article in the
series R for Researchers.
For a list of topics covered by this series, see the
[Introduction](RFR_Introduction.html) article.
If you\'re new to R we highly recommend reading the articles in order.
There is often more than one approach to the exercises.
Do not be concerned if your approach is different than
the solution provided.
These solutions require the solutions from the prior lesson be
run in your R session.
```{r, echo=FALSE, results="hide", message=FALSE, warning=FALSE, fig.show='hide'}
source("Scripts/RFR_alfalfa_DP.R")
source("Scripts/RFR_alfalfa_DE.R")
source("Scripts/RFR_alfalfa_Reg.R")
```
#### Exercise solutions
These exercises use the alfalfa dataset and the work
you started on the alfAnalysis script.
Open the script and run all the commands in the script
to prepare your session for these problems.
Note, we will use the shade and irrig variable as continuous
variables for these exercise.
They could also be considered as factor variables.
Since both represent increasing levels we first try to
use them as scale.
Use the model you selected as the best model from the
prior exercises.
1. Use plot to generate the prepared diagnostic plots.
```{r, comment=NA }
plot(out5)
```
2. Create a data.frame which includes the model variables
as well as the fitted, residuals, Cook\'s distance, and
leverage.
```{r, comment=NA }
out5Diag <- alfalfa[,c("irrig","inoc","shade","yield")]
out5Diag$fit <- fitted(out5)
out5Diag$res <- rstudent(out5)
out5Diag$cooks <- cooks.distance(out5)
out5Diag$lev <- hatvalues(out5)
str(out5Diag)
```
3. Reshape the data.frame from problem 3 to tall form.
```{r, comment=NA }
out5DiagNum <- out5Diag
for(i in colnames(out5DiagNum)) {
out5DiagNum[,i] <- as.numeric(out5DiagNum[,i])
}
out5DiagT <- reshape(out5Diag, varying=c("irrig","inoc","shade"),
v.names="varVal",
timevar="variable",
times=c("irrig","inoc","shade"),
drop=c("yield","fit"),
direction="long"
)
str(out5DiagT)
```
4. Plot Cook\'s distance verse the model variables faceted by
the model variables.
```{r, comment=NA }
ggplot(out5DiagT, aes(x=varVal, y=cooks) ) +
geom_point() +
facet_wrap(~variable, scales="free_x") +
theme_bw() +
theme(strip.background = element_rect(fill = "White"))
```
5. Rerun the model with the observation with the highest
Cook\'s distance removed.
```{r, comment=NA }
out5DiagCkId <- which(out5Diag$cooks >= .5)
out5DiagCkId
out5Ck <- lm(yield~irrig+inoc+shade, data=alfalfa[-c(out5DiagCkId),])
summary(out5Ck)
```
6. Compare the changes in the model coefficients.
```{r, comment=NA }
out5CoefDiff <- (coef(out5Ck) -coef(out5) ) / sqrt(diag(vcov(out5)))
names(out5CoefDiff) <- names(coef(out5))
out5CoefDiff
```
7. Commit your changes to AlfAnalysis.
There is no code associated with the solution to this problem.
Return to the [Diagnostics](RFR_Diagnostics.html) article.
Last Revised: 3/2/2015