SAS Grammar

Doug Hemken

February 2020

In order for SAS to execute your code, SAS has to be able to interpret it. In order for you to troubleshoot your code, you have to be able to interpret it.

This document is primarily about getting your SAS code on the page in a form that both you and the SAS interpreter can understand.

The concepts here will also help you make sense of the SAS documentation and examples.

See also Essential Concepts of Base SAS Software for considerably more detail on the rules of the SAS language.

The Major Building Blocks

The main units of work in most SAS programs are the DATA step and PROC steps. The SAS interpreter collects the code you submit until one of these steps is complete, and then executes that step.

DATA steps generally produce data sets. DATA steps are used to read in text data, produce new data values, merge data, subset data, label data, etc.

PROC steps include statistical procedures (like PROC MEANS or PROC REG) as well as utility procedures (like PROC SORT).

An example of a DATA step is

data new;
  set sashelp.class;
  bmi = 703*weight/height**2;
  run;

This creates a data set, named new, by reading an existing data set named class, and creating a new variable, named bmi.

An example of a PROC step is

proc means data=new;
  var bmi;
  run;
                            The MEANS Procedure

                         Analysis Variable : bmi 
 
     N            Mean         Std Dev         Minimum         Maximum
    ------------------------------------------------------------------
    19      17.8632519       2.0926193      13.4900007      21.4296601
    ------------------------------------------------------------------

This produces descriptive statistics for the variable bmi in the data set new.

A step begins with the key word DATA or PROC, and ends with a run; statement, or with the beginning of another step.

In addition to data steps and proc steps, SAS programs commonly include global statements, which often create pointers to directories and files or otherwise configure your SAS session (e.g. a LIBNAME statement).

An example of a global statement is

libname y "y:\sas";
data y.class;
  set sashelp.class;
  run;

This configures the SAS word y as a reference to the y:/sas folder on your computer, then copies the class data from sashelp to the y:/sas folder.

And finally, your program may include comments, text that SAS does not try to interpret.

Statements

Steps are composed of one or more statements. Most statements begin with a SAS keyword, and all statements end with a semi-colon, ;. In the examples above, the data step is composed of four statements, the proc step is composed of three statements.

Statements are composed of words (also called tokens) and special characters. A word might be a SAS keyword, or it might be a user-supplied word like a data set name, variable name, or a data value. Special characters include symbols like equals signs, parentheses, less-than and greater-than signs, etc.

The words in a statement need to be separated by spaces or special characters.

User supplied words - variable names, data set names, library names - should be composed of alphanumeric characters and begin with a letter. The only special character allowed is the underscore.

Layout

SAS parses code based on word and statement delimiters - spaces, special characters, keywords, and semi-colons. In addition to separating words in a statement, your use of white space should be guided by human-readability. Pick a consistent layout (“style”), preferably one that other humans find familiar, and stick with it. This will make it much easier to read and troubleshoot your code.

Spaces in Statements

Where you need one space, you may use as many spaces as you like. This can be useful for aligning code that belongs together concepturally, or where you are working with a list of similar commands, and lining up words across lines makes it easier for you to understand and debug your code.

A common scenario is where you have several assignment statements in a data step, and you align the equal signs so it is easier to spot the left-hand-sides versus the right-hand-sides.

data new;
  set old;
  landdistance  = run  + bike;
  waterdistance = swim + row;
  run;

Where special characters separate words, spaces are not required, but again, might be helpful for human readability.

It is common to indent groups of statements that together form some executable or logical unit. In the previous example, all the statements in the DATA step are indented, to show humans that they are executed together.

Lines

SAS treats line breaks as spaces (in code). This means that SAS can interpret code with multiple statements per line, and also statements that are spread across multiple lines.

Typical style is to write one statement per line, as illustrated above.

Multiple statements per line are usually difficult to read, and are the primary reason consultants have thinning hair and poor eyesight. Don’t bring us code that looks like this!

proc means data=new; var bmi; run;
                            The MEANS Procedure

                         Analysis Variable : bmi 
 
     N            Mean         Std Dev         Minimum         Maximum
    ------------------------------------------------------------------
    19      17.8632519       2.0926193      13.4900007      21.4296601
    ------------------------------------------------------------------

Statements that are especially long are commonly broken into multiple lines (SAS treats a line break as a space), with the continuation lines indented. While there are no general rules-of-thumb for how to do this, try to find a consistent style - consistency is your friend when debugging!

You use blank lines much the same way you use indentation, to visually demarcate blocks of code that form some sort of conceptual or logical unit.

Comments

We use comments for a variety of purposes.

One use of comments is to write explanatory notes in your code. Use these to explain the logic behind a block of code, or to describe the action of statements and keywords you don’t use very frequently. The first time you have to pick up months-old code and debug it, you will appreciate your foresight. Likewise, the first time you take over a project from someone else, you will appreciate their kindness.

Another major use of comments is in debugging. When you are struggling to figure out how an error popped up, it can be useful to disable pieces of your code in order to isolate a segment for closer examination and testing.

SAS comments come in two types, statement comments and block comments.

Statement Comments

A statement comment begins with an asterisk and ends with a semi-colon. Like any SAS statement, these may begin anywhere on a line and may extend over more than one line.

For example

data new;
  set sashelp.class;
* BMI is weight divided by squared height
    converted to Scientific Units;
  bmi = 703*weight/height**2;
  run;

Block Comments

A block comment begins with a slash-asterisk, and ends with an asterisk-slash. These are independent of statements. A block comment may appear within a statement, or may subsume multiple statements.

/* data new;
  set sashelp.class;
* BMI is weight divided by squared height
    converted to Scientific Units;
  bmi = 703*weight/height**2;
  run;
*/    
proc means data=new;
    var height weight;
    run;
                            The MEANS Procedure

 Variable    N           Mean        Std Dev        Minimum        Maximum
 -------------------------------------------------------------------------
 Height     19     62.3368421      5.1270752     51.3000000     72.0000000
 Weight     19    100.0263158     22.7739335     50.5000000    150.0000000
 -------------------------------------------------------------------------

In this example, we skip the DATA step, and only execute the PROC step.