Using SAS on Monsoon

In order to use SAS on the High Performance Cluster, you will need to first take the introductory seminars on using Monsoon. Once you have done this, you will be set for using SAS in batch mode either from the command line or via sbatch. The examples given below as suggestions on how to proceed. You will need to edit the control files to reflect your account settings and where you are storing your files on Monsoon.

The following examples will allow you to get up and going quickly with SAS. Note: in order to use SAS on Monsoon, you must load the sas module first. From the command line enter the following:

module load sas

This will set up your environment so that the SAS foundation suite will run. If you ever receive the following error, you must load the sas module:

-bash: sas: command not found

To get started with a simple test, use the vi editor (or another editor) to type in the following code to a file called 2samplettest.sas.

Example SAS code for a two-sample T-Test:

data efh;

input cond $ test msat;

label cond = 'Experimental condition';

label test = 'Fraction correct on post-test';

label msat = 'Math SAT score';

datalines;

A 0.71 650

A 0.82 710

A 0.82 510

A 0.76 590

A 0.76 500

A 0.71 730

A 0.71 570

A 0.82 780

B 0.65 690

B 0.53 710

B 0.88 780

B 0.59 690

B 0.76 730

B 0.59 700

B 0.65 740

;

proc print data=efh;

run;

proc ttest data=efh;

class cond;

var test;

run;

Let's say that the SAS code was saved in the file 2samplettest.sas. To run the SAS code in batch mode from the command line, we can type in the following:

sas 2samplettest.sas

Two output files will be created that should be reviewed; 2samplettest.lst and 2samplettest.log. The log file will show errors that may have occured and the lst file will be the output from the program. Running the sas program from the command line should only be done with very small datasets to just test a statistical model.

Partial Output from the above code:

TTEST PROCEDURE

Variable: TEST Fraction correct on post-test

COND N Mean Std Dev Std Error

---------------------------------------------------------------------------

A 8 0.76375000 0.05097268 0.01802156

B 7 0.66428571 0.11914377 0.04503211

Variances T DF Prob>|T|

---------------------------------------

Unequal 2.0506 7.9 0.0749

Equal 2.1553 13.0 0.0505

For H0: Variances are equal, F' = 5.46 DF = (6,7) Prob>F' = 0.0422

With unequal variances, the p-value of .07 provides only marginal evidence that the means of the two populations are not equal.

Running SAS under sbatch

In order to run SAS under sbatch to utilize the power of the cluster, you must first create an sbatch control file. As an example, the following is aa sample sbatch job file that you may use as an example:

Partial Output from the above code:

#!/bin/bash
#SBATCH --job-name=sas_test1
#SBATCH --workdir=/scratch/<uid>/<datadir>
#SBATCH --output=/scratch/<uid>/<datadir>/sastest.log
#SBATCH --time=10:00
#SBATCH --mem=850000
#SBATCH --partition=all
#SBATCH --cpus-per-task=2

module load sas
srun sas /scratch/<uid>/<datadir>/2samplettest.sas -nodate -linesize 90 -memsize 800G

Where <uid> is your login id and <datadir> is the directory on scratch in your area where the data and sas file is housed. Once you have the file created (say under the name sasjob.sh), and you have made changes to the example to reflect your environment, you are now ready to submit the job to run. You will need to alter the workdir and the output lines at a minimum. You should also change job-name to more aptly name your job. Any directories you will be using must be created ahead of time and data placed in the appropriate directories.

To run the job, at the command prompt enter:

sbatch /scratch/<uid>/<datadir>/sasjob.sh

You may use the squeue command to check the status of your job.

Libname Statement and SAS Datasets

Once you have performed some testing with your sample datasets, you may wish to save and work with SAS datasets. The following example code shows how to create a SAS dataset from a raw data file.

libname test '/scratch/<uid>/<datadir>';

data test.efh;
   infile '/scratch/<uid>/<datadir>/2samp.sdf';
   input cond $ test msat;
   label cond = 'Experimental condition';
   label test = 'Fraction correct on post-test';
   label msat = 'Math SAT score';
   
   proc print data=test.efh;
   run;
   
   proc ttest data=test.efh;
   class cond;
   var test;
   run;

This will create a SAS dataset named efh.sas7bdat in the /scratch/<uid>/<datadir> directory.

To use this file later in statistical processing you may use the following as an example.

libname test '/scratch/wew/<datadir>';

data efh;
   set test.efh;
   
   proc print data=efh;
   run;
   
   proc ttest data=efh;
   class cond;
   var test;
   run;

For more information on the libname statement you may refer to this libname support page.

You are allowed up to 5TB of storage on the Monsoon cluster where non-scratch and temp files may be stored in your /projects/<uid> directory once permanent storage has been approved.