Using SAS Data Sets

Doug Hemken

December 2017

SAS Notes

Introduction

Data sets are invoked in several different context, both to read and to write data.

In addition to distinguishing between SAS data set names and operating system names (see Saving SAS Data), we also need to make a distinction between temporary and permanent data sets. Permanent data sets are saved on the computer's disks (or other storage system), and remain there when you shut down SAS and go out into the world. Temporary data sets are purged when you shut down SAS.

As it happens, both temporary and permanent data sets (normally) exist as files on your computer - it is just that SAS erases the temporary data sets when you exit (shut down) the software. This makes it possible for SAS to work with gigantic data sets that would not fit in a computer's memory. A downside to this is that disk access to read data is always slower than memory access.

Temporary Data (WORK)

There are two main reasons to use temporary data sets:

While the advantage of automatic cleanup is obvious, faster data access deserves some explanation.

Not all disk locations are equally speedy. In the SSCC, reading (or writing) a file to a network drive is slower than reading from a computer's local hard drive. SAS is typically configured so that the location for temporary files is the speediest disk space your computer has access to. SAS gives this location a special library name, WORK.

For small data sets, the difference in speed is neglible in human terms. But for large (hundreds of MB or more) data sets, you will notice the difference between working with data on a network drive (any SSCC project drive, or your "U:\" drive) and data in WORK.

Here is how you might copy a data set from a network drive to the WORK library. (This is a small data set, so it is quick.)

libname u "U:\";
data work.class;
  set u.class;
run;

SAS makes WORK the default library, so when naming data sets you never actually have to specify WORK explicitly. Whenever you specify just a SAS data set name (with no explicit library reference), the data set belongs in WORK.

proc means data=class;
run;

Notice that the log spells out explicitly which copy of class was used.

2          proc means data=class;
3          run;

NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The PROCEDURE MEANS printed page 1.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds
      

Permanent Data

Data stored anywhere besides WORK is permanent, it persists after SAS is shut down. See Saving SAS Data for examples.