Using R from SAS

Doug Hemken

July 2017

[SAS workshop notes]

SAS can call R to pass data directly back and forth and to capture R output, but R can only call SAS in batch mode.

What follows, and more is documented in the SAS Online Help.

Setup in SAS

SAS requires two configuration options in order to communicate with R. First the RLANG option must be set when SAS is started. This may be set either in a custom configuration file (not currently implemented by SSCC) or on the SAS command line.

On Winstat you can implement either solution by putting a SAS shortcut on your desktop and changing it's properties, such as adding -rlang at the end of the command line.

Second, SAS needs an R_HOME environment variable to point it to the correct, available version of R.

The acceptable versions of R depend upon which version of SAS you are running.

On Winstat, the available versions are R 3.1.2 and R 3.4.0, but only the former works with our version of SAS, which is SAS 9.4 TS1M2. The most reliable way to set R_HOME is to include the statement

options set=R_HOME='C:\Program Files\R\R-3.1.2';

within your SAS command file.

Sending SAS data to R

SAS can pass data to an R session, and ask R for an analysis. All communication with R is done via SAS's PROC IML. Note here that capitization matters in R, and that character variables are automatically converted to factors. In this example, then, it is important that the variable names be capitalized!

proc iml;
  call ExportDataSetToR("Sashelp.Class", "dframe" );
  submit / R;
    names(dframe)
    lm(Weight ~ Height + Age + Sex, data=dframe)
  endsubmit;
run;
[1] "Finishing Rprofile.site from C:/Program Files/R/R-3.1.2/etc"
[1] "Name"   "Sex"    "Age"    "Height" "Weight"

Call:
lm(formula = Weight ~ Height + Age + Sex, data = dframe)

Coefficients:
(Intercept)       Height          Age         SexM
   -125.115        2.873        3.113        8.744

Loading a package in R

You can load packages in R in the usual way, so long as the package is installed and in a location where R will find it. In this example, we can have R load the MASS package, run a linear model with one of it's data sets, and send the default R output back to SAS.

proc iml;
  submit / R;
    library(MASS, lib.loc=.Library)
    # use a data frame from MASS
    lm(VitC ~ Cult + Date + HeadWt, data=cabbages)
  endsubmit;
[1] "Finishing Rprofile.site from C:/Program Files/R/R-3.1.2/etc"

Call:
lm(formula = VitC ~ Cult + Date + HeadWt, data = cabbages)

Coefficients:
(Intercept)      Cultc52      Dated20      Dated21       HeadWt
     63.334       10.135       -1.213        4.186       -4.412

Importing a data frame from R

R matrices and data frames may be brought back into SAS as well, for any manipulation you might want to do in SAS. Here, we just grab the cabbages data frame from R and show that SAS's PROC GLM "agrees" with R's lm command (once you realize they have different reference categories).

proc iml;
  submit / R;
  library(MASS, lib.loc=.Library)
  endsubmit;
  call ImportDataSetFromR("cabbages", "cabbages");
run;

proc glm data=cabbages;
    class Cult Date;
    model VitC = Cult Date HeadWt / solution;
run;
[1] "Finishing Rprofile.site from C:/Program Files/R/R-3.1.2/etc"
 
                                                                           
 
                             The GLM Procedure

                         Class Level Information
 
                   Class         Levels    Values

                   Cult               2    c39 c52     

                   Date               3    d16 d20 d21 

                  Number of Observations Read          60
                  Number of Observations Used          60
 
                                                                           
 
                             The GLM Procedure
 
Dependent Variable: VitC   

                                     Sum of
 Source                    DF       Squares   Mean Square  F Value  Pr > F

 Model                      4   4035.062033   1008.765508    27.66  <.0001

 Error                     55   2005.787967     36.468872                 

 Corrected Total           59   6040.850000                               

            R-Square     Coeff Var      Root MSE     VitC Mean

            0.667963      10.42096      6.038946      57.95000


 Source                    DF     Type I SS   Mean Square  F Value  Pr > F

 Cult                       1   2496.150000   2496.150000    68.45  <.0001
 Date                       2    909.300000    454.650000    12.47  <.0001
 HeadWt                     1    629.612033    629.612033    17.26  0.0001


 Source                    DF   Type III SS   Mean Square  F Value  Pr > F

 Cult                       1   1303.360264   1303.360264    35.74  <.0001
 Date                       2    259.433518    129.716759     3.56  0.0353
 HeadWt                     1    629.612033    629.612033    17.26  0.0001

                                          Standard
  Parameter             Estimate             Error    t Value    Pr > |t|

  Intercept          77.65535683 B      2.45990290      31.57      <.0001
  Cult      c39     -10.13496356 B      1.69531779      -5.98      <.0001
  Cult      c52       0.00000000 B       .                .         .    
  Date      d16      -4.18644031 B      2.01826561      -2.07      0.0427
  Date      d20      -5.39955164 B      2.11225494      -2.56      0.0134
  Date      d21       0.00000000 B       .                .         .    
  HeadWt             -4.41229218        1.06191296      -4.16      0.0001

NOTE: The X'X matrix has been found to be singular, and a generalized 
      inverse was used to solve the normal equations.  Terms whose 
      estimates are followed by the letter 'B' are not uniquely estimable.