Writing Dynamic Markdown Documents Using Stata (DRAFT)

Comparing dyndoc, markstat, markdoc, and webdoc

Doug Hemken

11 Dec 2017

With the release of Stata 15, there are now three commands available in Stata to generate HTML documents from dynamically specified Markdown source documents. The command dyndoc is the recently released official Stata command, while markstat and markdoc are previously released user written commands from Germán Rodríguez and E.F. Haghish, respectively.

A fourth command, webdoc from Ben Jann, takes dynamically specified Markdown source documents and generates plain Markdown documents. A second step is then required to convert plain Markdown to HTML. This can be accomplished with either the recently released markdown command, or using the pandoc command (installed with markdoc).

The central function of these commands is to take a file containing both text written in Markdown and code written in Stata, and produce an HTML document that includes the text, the code, and the results from the code. (markstat and markdoc both produce documents in other formats as well.)

The fundamental dynamic features of these commands boil down to executing and displaying the results of:

Software Requirements

For users working in a computing environment where they do not have administrator rights to install software, or users who are not familiar with specifying the paths to executables, these are hurdles.

Workflow

The differences between these commands are great enough that the you are required to choose among them once you begin mixing text and code - they are decidedly not alternative engines for rendering documents from the same file.

For documentation of simple coding tasks, I find it pretty straightforward to simply begin writing a document in Markdown, including the Stata code as I go. Both dyndoc and markstat are well suited to this writing workflow.

For documentation of tasks that require some coding effort, I usually find myself writing just the code first, often interspersed with comments that will eventually be fleshed out as text. markdoc and webdoc seem more oriented toward this writing workflow.

If you are writing your document in the Stata do-file editor – which does not recognize any of these formats – you can conveniently test selections of your code interactively at any point in the writing process. When you are nearing the end of the process of combining code and text, you will probably appreciate the format that is visually the simplest - this is, after all, one of the main points of Markdown.

For working in the do-file editor, it may be convenient to turn off syntax highlighting (Edit - Preferences), especially to work with markdoc and webdoc files where plain text (in Stata comments) is rendered green.

Discussion

The spirit of Markdown is to have a written format that provides a few formatting options (headers, lists, code blocks, image and url links), but that is nearly as readable before processing as after. Adding code to be processed necessarily makes this a little more complicated - but not that much. A dynamic document should look pretty much like a pure Markdown document, without the results and graphs.

Of these commands, markstat is clearly the best at keeping the ink on the page to a minimum, corraling the visual clutter. It gives the user the best source documents to work with.

dyndoc is the clear winner on simplicity of processing syntax (the dyndoc command itself) and use.
No extra installation is required, and the user need not know anything about locating and specifying executables on their computer(s).

dyndoc also produces the least file clutter. Markstat, markdoc, and webdoc all leave intermediate files littering your directories. While these files can come in handy at times, keeping them should be an option not a requirement.

On the other hand, markstat and markdoc are both capable of taking the same source file and creating documents in other formats, notably .pdf.

None of these formats has an accompanying utility command to extract the plain Stata code for use in a live demonstration. Markstat, however, produces a do file as a side effect (but this requires running the do file first).

To my mind, the ideal dynamic documentation command and document format would combine the simplicity of markstat (which most closely conforms to Markdown standards for non-Stata languages), with the simple installation and lack of file clutter of dyndoc, and the flexibility of output formats provided by markstat and markdoc.

Take a look at some of the details, and see if you don’t agree.

Code Blocks

A dynamic document is composed of text and code to be executed. It is dynamic in the sense that, after writing, the document is processed to produce the final version for reading. To be processed by Stata, some distinction has to be made between text and code. Each of these commands does this in a different style.

dyndoc «dd_do»

For use with dyndoc, code blocks begin with <<dd_do>> and end with <</dd_do>>, and are usually formatted with Markdown code fences, like this:

 Code              |    With Context   
-------------------|-------------------
                   |    Some text.  
```                |    ```           
<<dd_do>>          |    <<dd_do>>   
sysuse auto        |    sysuse auto 
<</dd_do>>         |    <</dd_do>>  
```                |    ```           
                   |    More text.  

The result in your document would be rendered as:

. sysuse auto
(1978 Automobile Data)

markstat ```{s}

For use with markstat, you can demarcate code blocks several different ways. Perhaps the clearest, visually, is to use backticks marked with an {s}, as in:

 Code              |    With Context   
-------------------|-------------------
                   |    Some text.  
```{s}             |    ```{s}           
sysuse auto        |    sysuse auto 
```                |    ```           
                   |    More text.  

The braces are optional, for an even cleaner look. And if you are willing to give up some Markdown formatting features for lists, you can simply use indentation and blank lines to demarcate code. In context:

Some text.

	sysuse auto
	
More text.

Uniquely, markstat allows you to use an “m” instead of an “s” for the code fence “info tag”, to work directly in Mata. This visual style, and the use of the info tag to signify a code language, makes markstat the most in sync with non-Stata dynamic Markdown use.

markdoc /***

With markdoc, the Stata code is written in ordinary .do file style, the text is demarcated, and the first step of processing is to produce a Stata log file in smcl format.

Our last example would look like this:

 Code                      |    With Context   
---------------------------|-------------------
quietly log using somefile |    quietly log using somefile
                           |    /***
                           |    Some text.
                           |    ***/
sysuse auto                |    sysuse auto 
                           |    /***           
                           |    More text.  
                           |    ***/
qui log c                  |    qui log c

(It is important that the final log close be abbreviated as above.)

webdoc /***

As with markdoc, for webdoc the Stata code is written in ordinary .do file style, the text is demarcated. Here the first step produces a Markdown document, i.e. all of the dynamic elements are resolved and replaced. A second step is required to then take this to an HTML document.

Our example would look like this:

 Code                                 |    With Context   
--------------------------------------|-------------------
 webdoc init example, logall plain md |    webdoc init example, logall plain md
                                      |    /***
                                      |    Some text.
                                      |    ***/
sysuse auto                           |    sysuse auto 
                                      |    /***           
                                      |    More text.  
                                      |    ***/

Display inline

In addition to showing your results in separate code blocks in your final document, you can also use the results of code in line in your text.

dyndoc <<dd_display: >>

With dyndoc, anything that can be returned by the display command can be included in a line with text. For example

Today's date is <<dd_display: c(current_date)>>.

would appear as: Today’s date is 11 Dec 2017.

markstat `s

With markstat this is visually simpler. Code is just demarcated with backticks and an “s” info tag.

Today's date is `s c(current_date)`.

markdoc txt command

In markdoc format, in-line text and results are added to your document by a command embedded within Stata code.

***/
txt "Today's date is " c(current_date) "."
/***

webdoc webdoc substitute command

In webdoc format, results can be substituted into in-line text by a command embedded within Stata code.

webdoc substitute "XXX" "`c(current_date)'"
/***
Today's date is XXX
***/

Graphs

In addition to the sorts of results you might find in the Results window, you may also want to include a graph in your document. None of these commands will completely automate that for you, that is, they will not detect that you have issued a graph command or sent output to a graph window. However, not too much extra work is required.

For dyndoc, all you have to do (after creating the graph) is include a dynamic tag where you want the graph to appear in your document. For markstat and markdoc, you will need to save the graph as a file, then add a link to the file in your document. A small amount of extra effort is required to hide the graph export from the reader.

Suppose we first make a graph with:

. histogram price
(bin=8, start=3291, width=1576.875)

dyndoc <<dd_graph: >>

Include the graph with:

<<dd_graph: >>

The result then will be:

markstat graph export

For markstat, you would first save the graph using the graph export command in Stata, then include an image link in your Markdown text.

```{s}
graph export hist_ms.svg, replace
```

Some text.
![Prices](hist_ms.svg)

markdoc graph export

Like markstat, first save the graph, then link it.

***/
graph export hist_md.svg, replace
/***

Some text.
![Prices](hist_md.svg)

webdoc webdoc graph

Webdoc is similar to dyndoc, in that you just include a directive where you want the graph to appear in the document - you do not need to include code to save the graph first. This directive appears as Stata code.

Some text.
***/
webdoc graph
/***

Example Files

Simple documents summarizing what we have covered so far:

review-dyndoc.smd and review-dyndoc.html

Note the file extension for markdoc must be .stmd

review-markstat.stmd and review-markstat.html

review-markdoc.do and review-markdoc.html

review-webdoc.do and review-webdoc.html

Tables

Uniquely, dyndoc is able to place some Stata output tables in the document text, rendered as html tables. Instead of just

. tabulate rep78 foreign

    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69 


You use (note the markdown option and the lack of code fences)

<<dd_do: nocommands>>
tabulate rep78 foreign, markdown
<</dd_do>>

To produce

Car type
Repair Record 1978 Domestic Foreign Total
1 2 0 2
2 8 0 8
3 27 3 30
4 9 9 18
5 2 9 11
Total 48 21 69

The same effect can be acheived with the other commands, but not nearly as simply (for the command Stata currently supports).

markstat requires you to specify each cell value as inline code, which takes a lot of hidden code to set up.

markdoc like markstat requires specifying each cell value as inline code, but perhaps simplifies this a little with a special tbl command.

webdoc, geared as it is to produce Markdown, would be very similar to markstat, relying on substitution rather than in-line code.

And it should be noted that for arbitrary tables in dyndoc, the same tedious approach would be required - this is only partially implemented in Stata, and is so far largely undocumented as well. Some commands that produce output tables formatted in Markdown are:

Formulas

If your web site serves MathJax, all these commands will pass formulas along to your web site.

A display formula like:

$$ mpg = \beta_{0} + \beta_{1} \times weight $$

becomes:

$$ mpg = \beta_{0} + \beta_{1} \times weight $$

Since any of these commands could be rendered from Markdown to HTML by Pandoc, they can all use Pandoc to render formulas in other ways as well (e.g. Unicode). Except in graphs, you would not use SMCL.

Processing

dyndoc

dyndoc requires no additional packages or software, and is processed with:

dyndoc review-dyndoc.smd, replace

markstat

markstat requires installing the additional whereis package and the pandoc software, so I am glossing several steps here.
Once installed, before your first use of markstat you issue the command:

whereis pandoc "C:\Program Files\RStudio\bin\pandoc\pandoc.exe"

Thereafter you use markstat with:

markstat using review-markstat.stmd, strict

markdoc

markdoc also requires additional packages and software - this command takes the most effort to set up. Once everything is installed, processing a document means processing a do file to create a log, then processing the log.

do review-markdoc.do

markdoc review-markdoc, export(html) pandoc("C:\Program Files\RStudio\bin\pandoc\pandoc.exe")

webdoc

webdoc also requires two steps for every document, like markdoc.

webdoc do review-webdoc.do
markdown review-webdoc.md, saving(review-webdoc.html)