--- title: 'R for Researchers: Regression (GLM) solutions' date: "April 2015" --- This article contains solutions to exercises for an article in the series R for Researchers. For a list of topics covered by this series, see the [Introduction](RFR_Introduction.html) article. If you\'re new to R we highly recommend reading the articles in order. There is often more than one approach to the exercises. Do not be concerned if your approach is different than the solution provided. * The following preparation code was used to prepare the R session for these solutions ``` ####################################################### ####################################################### ## ## Environment Setup ## ####################################################### ####################################################### library(faraway) # glm support library(MASS) # negative binomial support library(car) # regression functions library(lme4) # random effects library(ggplot2) # plotting commands library(reshape2) # wide to tall reshaping library(xtable) # nice table formatting library(knitr) # kable table formatting library(grid) # units function for ggplot saveDir <- getwd() # get the current working directory wd <- "u:/RFR" # path to my project setwd(wd) # set this path as my work directory ``` ```{r, echo=FALSE, results="hide", message=FALSE, warning=FALSE, fig.show='hide'} ####################################################### ####################################################### ## ## Environment Setup ## ####################################################### ####################################################### library(faraway) # glm support library(MASS) # negative binomial support library(car) # regression functions library(lme4) # random effects library(ggplot2) # plotting commands library(reshape2) # wide to tall reshaping library(xtable) # nice table formatting library(knitr) # kable table formatting library(grid) # units function for ggplot ``` #### Exercise solutions 1. Use the following code to load the warpbreaks data set and examine the variables in the data set. ```{r, comment=NA } data(warpbreaks) str(warpbreaks) plot(warpbreaks) ``` 2. Use the poisson family and fit breaks with wool, tension, and their interaction. ```{r, comment=NA } pMod <- glm(breaks~wool*tension,family=poisson, data=warpbreaks) summary(pMod) ``` 3. Check to see if this is an appropriate model. If not, choose a more appropriate model form. ```{r, comment=NA } 1 - pchisq(deviance(pMod),df.residual(pMod)) pMod2 <- glm(breaks~wool*tension, family="quasipoisson", data=warpbreaks) pMod2Diag <- data.frame(warpbreaks, link=predict(pMod2, type="link"), fit=predict(pMod2, type="response"), pearson=residuals(pMod2,type="pearson"), resid=residuals(pMod2,type="response"), residSqr=residuals(pMod2,type="response")^2 ) ggplot(data=pMod2Diag, aes(x=fit, y=residSqr)) + geom_point() + geom_abline(intercept = 0, slope = 1) + geom_abline(intercept = 0, slope = summary(pMod2)$dispersion, color="green") + stat_smooth(method="loess", se = FALSE) + theme_bw() nbMod <- glm.nb(breaks~wool*tension, data=warpbreaks) summary(nbMod) ``` 4. Use the backward selection method to reduce your model, if possible. Use your model from the prior problem as the starting model. ```{r, comment=NA } step(nbMod, test="LRT") ``` **No terms can be dropped from the model.** 5. Check the residual variance assumption for your model. ```{r, comment=NA } nbModDiag <- data.frame(warpbreaks, link=predict(nbMod, type="link"), fit=predict(nbMod, type="response"), pearson=residuals(nbMod,type="pearson"), resid=residuals(nbMod,type="response"), residSqr=residuals(nbMod,type="response")^2 ) ggplot(data=nbModDiag, aes(x=fit, y=residSqr)) + geom_point() + geom_abline(intercept = 0, slope = 1) + geom_abline(intercept = 0, slope = summary(pMod2)$dispersion, color="green") + stat_function(fun=function(fit){fit + fit^2/summary(nbMod)$theta}, color="red") + stat_smooth(method="loess", se = FALSE) + theme_bw() ``` **The negative binomial model is closer to means of the loess line than to the quasipoisson model.** **The range of values predicted by the model for the predicted values below 20 appears to be overstated by this model, though not as much as by the quasipossion model.** Return to the [Regression (OLS)](RFR_Regression.html) article. Last Revised: 3/4/2015