1 Stata and R Markdown

1.1 Introduction

This is an introduction to writing dynamic documents using R Markdown to produce documents based on Stata. This process uses Rstudio (or just R) to create documents that depend upon Stata code.

1.2 Background

Markdown is a language for formatting not-too-complicated documents using just a few text symbols. It is designed to be easy to read and write. If you read and write email, you are probably already familiar with many of these formatting conventions. For more specifics about Markdown see John Gruber's Markdown article.

Dynamic Markdown has been implemented for a number of programming languages, including Stata and R. Within Stata there is a dynamic markdown package called stmd that relies on Stata's dyndoc command, as well as the user-written package markstat. Each has it's strengths and weaknesses.

The system I will describe here is intended primarily for those of us who are already using R Markdown to write documentation in other languages, and would like to use this for Stata as well.

R Markdown is a dynamic markdown system that extends Markdown by allowing you to include blocks of code in one of several programming languages. The code is evaluated, and both the code and it's results are included in a Markdown document. To read more about the details of R Markdown see RStudio's R Markdown webpages

RStudio uses an R package called knitr (this could also be called directly from R), which includes the ability to evaluate Stata.

The documentation for knitr can be found in R's Help, from Yihui Xie's web page, or in the book, R Markdown: The Definitive Guide.

Finally, I use some helper functions in a package called Statamarkdown. While these are not necessary to write dynamic documents based on Stata, they make life easier.

1.3 Install Statamarkdown

Statamarkdown can be installed from github.com. (See section 2, Installing Statamarkdown, for more installation options.)

library(devtools) # you may also need to install devtools
install_github("hemken/Statamarkdown")

Note, RStudio is a great environment for writing Markdown with executable R code chunks, but it is not a friendly environment for extensively debugging problems in your Stata code. If your Stata code is complicated, you should probably work out the details in Stata first, then pull it into RStudio to develop your documentation!

1.4 Set up the Stata engine

In order to execute your Stata code, knitr needs to know where the Stata executable is located. This can be done with a preliminary code chunk, by loading the Statamarkdown package:

```{r, echo=FALSE, message=FALSE}
library(Statamarkdown)
```

(In knitr jargon, a block of code is a "code chunk".)

If the package fails to find your copy of Stata (you will see a message), you may have to specify this yourself (see section 3, Stata Engine Path, for more details).

After this setup chunk, subsequent code to be processed by Stata can be specified as:

```{stata}

-- Stata code here --

```

1.6 Hints and Examples

1.6.1 Code Separate or with Output

Stata does not give you fine control over what ends up in the .log file. You can decide whether to present code and output separately (R style), or include the code in the output (Stata style).
See section 5, Stata Output and cleanlog.

1.6.2 Including Graphs

Including graphics requires graph export in Stata, and an image link in the R Markdown. The knitr chunk option echo can print just specified lines of code, allowing you to hide the graph export command as illustrated below.

1.6.3 Descriptive Statistics

A simple example.

```{stata, collectcode=TRUE}
sysuse auto
summarize
```
sysuse auto
summarize
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        make |          0
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
    headroom |         74    2.993243    .8459948        1.5          5
-------------+---------------------------------------------------------
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865    .4562871       2.19       3.89
     foreign |         74    .2972973    .4601885          0          1

1.6.4 Frequency Tables

Using chunk options echo=FALSE, cleanlog=FALSE, yields a more typical Stata documentation style.

```{stata, echo=FALSE, cleanlog=FALSE}
tab1 foreign rep78
```
 . tab1 foreign rep78

-> tabulation of foreign  

 Car origin |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

-> tabulation of rep78  

     Repair |
record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

1.6.5 T-tests

Another very simple example.

ttest mpg, by(foreign)
Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
Domestic |      52    19.82692     .657777    4.743297    18.50638    21.14747
 Foreign |      22    24.77273     1.40951    6.611187    21.84149    27.70396
---------+--------------------------------------------------------------------
Combined |      74     21.2973    .6725511    5.785503     19.9569    22.63769
---------+--------------------------------------------------------------------
    diff |           -4.945804    1.362162               -7.661225   -2.230384
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =  -3.6308
H0: diff = 0                                     Degrees of freedom =       72

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0003         Pr(|T| > |t|) = 0.0005          Pr(T > t) = 0.9997

1.6.6 Graphics

The example uses the knitr chunk options results="hide" to suppress the log and echo=1 to show only the Stata graph box command that users need to see.

```{stata, echo=1, results="hide"}
graph box mpg, over(foreign)
graph export "boxplot.svg", replace
```
graph box mpg, over(foreign)

Example boxplot

(This page was written using Statamarkdown version 0.7.1.)