Simple and Easy Explanation of Sas Software Programs
In this blog, I will introduce you to some of the important concepts of SAS programming. Before we get started, it is important you get familiar with SAS. My previous blog on SAS Tutorial will help you understand SAS, its applications and will help you install SAS University Edition, which we would be using here as a programming environment.Wondering what are the skills, you should master this year? Also, if you've been planning to step into Data Analytics, SAS Certification Training is one of the best ways to get started with the same.
Edureka 2019 Tech Career Guide is out! Hottest job roles, precise learning paths, industry outlook & more in the guide. Download now.
So without any further delay, let's get started with SAS programming, shall we?
This blog will help you understand the following topics:
- Fundamentals Of SAS Programming
- SAS Code Structure
- Informats And Formats In SAS
- SAS Loops
- Basic Statistical Procedures Using SAS
Before we start coding, I would like to brief you with a few important terms which are important for SAS programming.
Fundamentals Of SAS Programming
SAS Windows
Large organisations and training institutes prefer using SAS Windows. SAS Windows has a lot of utilities that help reduce the time required to write codes.
The following image shows the different parts of SAS Windows.
- Log Window: It is an execution window. Here, you can check the execution of your program. It also displays errors, warnings and notes.
- Code Window :This window is also known as editor window. Consider it as a blank paper or a notepad, where you can write your SAS code.
- Output Window: As the name suggests, this window displays the output of the program/ code which you write in the editor.
- Result Window : It is an index that list all the outputs of programs that are run in one session. Since it holds the results of a particular session, if you close the software and restart it, the result window will be empty.
- Explore Window : It holds the list of all the libraries in the system. You can also browse the system supported files here.
A few organisations use Linux, however, with no graphical user interface you have to write code for every query. Hence it is inconvenient to use.
SAS Data Sets
SAS data sets are called as data files. Data files constitute of rows and columns. Rows hold observations and columns hold Variable names.
SAS Variables
SAS has two types of variables:
- Numeric variables : This is the default variable type. These variables are used in mathematical expressions.
- Character variables: Character variables are used for values that are not used in mathematical expressions.
They are treated as text or strings. A variable becomes a character variable by adding a '$' s ign at the end of the variable name.
SAS Libraries
SAS library is a collection of SAS files that are stored in the same folder or directory on your computer.
- Temporary Library: In this library, the data set gets deleted when the SAS session ends.
- Permanent Library: Data sets are saved permanently. Hence, they are available across sessions.
Users can also create or define a new library known as user defined libraries by using the keyword LIBNAME. These are also permanent libraries.
SAS Programming: SAS Code Structure
SAS programming is based on two building blocks:
- DATA Step: The DATA step creates a SAS data set and then passes the data onto a PROC step
- PROC Step: The PROC step processes the data
A SAS program should follow below mentioned rules:
- Almost every code will begin with either DATA or a PROC Step
- Every line of SAS code ends with a semi colon
- A SAS code ends with RUN or QUIT keyword
- SAS codes are not case sensitive
- You can write a code across different lines or you can write multiple statements in one line
Now that we have seen a few basic terminologies, let us get started with SAS programming with this basic code:
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$; datalines; 101 Mak SQL 102 Rama SAS 103 Priya Java 104 Karthik Excel 105 Mandeep SAS ; Run;
In the above code, we created a data set called as Employee_Info. It has three variables, one numeric variable as Emp_Id and two character variables as Emp_Name and Emp_Verticals. The Run command displays the data set in the Output Window.
The image below shows the output of the above mentioned code.
Suppose you want to see the result in print view, well you can do that by using a PROC PRINT procedure, the rest of the code remains same.
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$; datalines; 101 Mak SQL 102 Rama SAS 103 Priya Java 104 Karthik Excel 105 Mandeep SAS ; Run; PROC PRINT DATA=Employee_Info; Run;
The image below, shows the output of the above code.
We just created a data set and understood how the PRINT procedure works. Now, let us take the above data set and use it for further programming. Let's say we want to add employee's Date of joining to the data set. So we create a variable called as DOJ, give it as input and print the result.
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$ DOJ; datalines; 101 Mak SQL 18/08/2013 102 Rama SAS 25/06/2015 103 Priya Java 21/02/2010 104 Karthik Excel 19/05/2007 105 Mandeep SAS 11/09/2016 ; Run; PROC PRINT DATA=Employee_Info; Run;
The below image shows the output of the above code. It is visible that a variable was created, but the value of DOJ wasn't printed. Instead, we see dots have replaced the date values.
Why did this happen? Well, DOJ variable is without a suffix '$', that means, by default SAS will read it as a numeric variable. But, the data we entered has a special character '/', hence it does not print the result since it is not purely numeric data. If you check the log window you will see an error message as 'invalid data for variable DOJ'
Now how do we solve this problem? Well, one way to solve it is by using a suffix '$' for DOJ variable. This will convert DOJ variable to character and you will be able to print date values. Let us make the changes to the code and see the output.
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$ DOJ$; datalines; 101 Mak SQL 18/08/2013 102 Rama SAS 25/06/2015 103 Priya Java 21/02/2010 104 Karthik Excel 19/05/2007 105 Mandeep SAS 11/09/2016 ; Run; PROC PRINT DATA=Employee_Info; Run;
The output screen will display the following output.
You can see that the data values are displayed as dates by converting DOJ to character. However, this is a temporary solution. Let me explain it how?
Well, imagine a bank has a similar data set. The data set has account holder details like loan amount, installments, and due date for loan installment. Imagine, the holder has missed his deadline to pay an installment and bank wants to calculate the delay. The bank will have to calculate the difference between the deadline date and the current date.
But, if the bank's data set has dates in character format, then the bank won't be able to perform mathematical operations on it. This issue may affect our data set too. So how do we solve this problem?
The next concept will help you overcome this issue.
Informats And Formats In SAS
It is important that you understand this topic well if you want to be good at SAS programming. If you can recall, I mentioned earlier that SAS has two standard variable types:
- Numeric
- Character
When SAS comes across non standard variables, SAS will throw an error or you won't get the desired output. To overcome this problem, SAS uses Informats and Formats.
Informat
Informats are typically used to read or input data from external files or flat files (like text files or sequential files). The informat instructs SAS on how to read data into SAS variables. SAS has three types of Informats: character, numeric, and date/ time. Informats are named according to the following syntax structure:
- Character Informat: $INFORMATw.
- Numeric Informat: INFORMATw.d
- Date/ Time Informat: INFORMATw.
The '$' indicates a character informat. INFORMAT refers to the sometimes optional SAS informat name. The 'w' indicates the width (bytes or number of columns) of the variable. The 'd' is used for numeric data to specify the number of digits to the right of the decimal place. All informats must contain a decimal point(.) so that SAS can
differentiate an informat from a SAS variable.
Let us go back to our previous code and see if Date/ Time Informat helps us. So let's change the code accordingly and add a Date Informat to it as follows:
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$ DOJ; INFORMAT DOJ ddmmyy10.; datalines; 101 Mak SQL 18/08/2013 102 Rama SAS 25/06/2015 103 Priya Java 21/02/2010 104 Karthik Excel 19/05/2007 105 Mandeep SAS 11/09/2016 ; Run; PROC PRINT DATA=Employee_Info; Run;
Line number 3 in the code instructs SAS to read in the variable 'date of joining' (DOJ) using the date
informat MMDDYYw. For each date field occupies 10 spaces, the 'w.' qualifier is set to 10.
The output of the code would look like as follows.
The result shows we still don't have the desired result, instead the DOJ column is holding some numeric values and not the dates we specified. Now, why is that? Well, once a date is read with a date informat, SAS stores the date as a number. That means, it is read as the number of days between the date and January 1, 1960 (For example: 3/15/1994 is stored as 12492).
The reason behind this is that SAS has three separate counters which keep track of dates and time. These date counters started at zero on January 1, 1960. Hence dates before 1/1/1960 have negative values, and any date after has a positive value. Every day at midnight, the date counter is incremented by one.
One story has it that the founders of SAS wanted to use the approximate birth date of the IBM 370 system, and they chose January 1, 1960 as an easy to remember approximation.
Now that you know the reason why the column DOJ displayed those numbers, let us try to solve this problem. To overcome this problem we use Format.
Format
Informats are the instructions for reading data, whereas formats are the instructions used to display or output data. Defining a format for a variable is how you tell SAS to display the values in the variable. Formats are grouped into the same three classes as informats (character, numeric, and date-time) and also always contain a dot.
The general form of a format statement is:
- FORMAT variable-name FORMAT-NAME.;
Let us go back to our code having dataset Employee_Info to see if we can display the date correctly using FORMAT command.
DATA Employee_Info; input Emp_ID Emp_Name$ Emp_Vertical$ DOJ; INFORMAT DOJ ddmmyy10.; FORMAT DOJ ddmmyy10.; datalines; 101 Mak SQL 18/08/2013 102 Rama SAS 25/06/2015 103 Priya Java 21/02/2010 104 Karthik Excel 19/05/2007 105 Mandeep SAS 11/09/2016 ; Run; PROC PRINT DATA=Employee_Info; Run;
We have used FORMAT command in line number 4 in the above code. The following output screen will give us the desired output.
We have successfully displayed the data set using Date format command. I hope you have understood how to use format and informat. Let us move ahead with our SAS programming blog and take a look at another important concept.
SAS Loops
While doing SAS programming, we may encounter situations where we repeatedly need to execute a block of code several number of times. It is inconvenient to write the same set of statements again and again. This is where loops come into picture. In SAS, the Do statement is used to implement loops. It is also known as the Do Loop. The image below shows the general form of the Do loop statements in SAS.
Following are the types of DO loops in SAS:
- Index: The loop continues from the start value till the stop value of the index variable.
- While: The loop continues as long as the While condition becomes false.
- Until: The loop continues till the Until condition becomes True.
Do Index loop
We use an index variable as a start and stop value for Do Index loop. The SAS statements get executed repeatedly till the index variable reaches its final value.
Syntax:
Do indexvariable = initialvalue to finalvalue; SAS statements; End;
Let us take a look at sample code to understand Do Index Loop. In the below code, VAR is the index variable.
DATA SampleLoop; SUM=0; Do VAR = 1 to 10; SUM = SUM + VAR; END; PROC PRINT DATA = SampleLoop; Run;
When you execute the above code, you will get the following output.
Do While Loop
The Do While loop uses a WHILE condition. This Loop executes the block of code when the condition is true and keeps executing it, till the condition becomes false. Once the condition becomes false, the loop is terminated.
Syntax:
Do While (condition); SAS statements; End;
Following sample code will help you understand DO WHILE loop.
DATA SampleLoop; SUM=0; VAR=1; Do While(VAR<15); SUM = SUM + VAR; VAR+1; END; PROC PRINT DATA = SampleLoop; Run;
The above code will give you following output.
Do Until Loop
The Do Until loop uses an Until condition.This Loop executes the block of code when the condition is false and keeps executing it, till the condition becomes true. Once the condition becomes true, the loop is terminated.
Syntax:
Do Until (condition); SAS statements; END;
Let us take a look at sample program.
DATA SampleLoop; SUM=0; VAR=1; Do Until(VAR>15); SUM=SUM+VAR; VAR+1; END; PROC PRINT; Run;
The code has the following output.
Thus we have finished the concept of loops in SAS programming. All the topics we studied till now have talked about basics of SAS programming in general.
Now let us take a look some statistical procedures. These procedures will form a base for advancedanalytical procedures .
Subscribe to our youtube channel to get new updates..!
Basic Statistical Procedures Using SAS
PROC MEANS
This procedure is used to calculate arithmetic mean and standard deviation. For people who are new to statistics may find it difficult to understand these terms. So before we start coding and use this procedure. I will try to explain what these terms mean.
Let's start with arithmetic mean and see how PROC MEANS is used in SAS programming to calculate it.
Arithmetic Mean
Sum of the value of numeric variables, divided by the number of variables gives you the arithmetic mean. It is also known as mean and is a measure of central tendency. A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.
In SAS programming, you use PROC MEANS to calculate the arithmetic mean. This procedure lets you find mean of all variables or few variables of a data set. You can also form groups and calculate mean of variables specific to that group.
Syntax:
PROC MEANS DATA = DATASET; Class Variables ; Var Variables;
- Variables: Variables in the above syntax indicate variables from the data set whose mean is to be calculated.
Mean Of A Dataset
If you supply only the data set name without any variables, you can calculate the mean of all the variables in a data set.
Let us take a look at a sample code. I have considered a predefined SAS data set called as 'cars'. The following command will display the data set.
PROC PRINT data=sashelp.CARS; Run;
The image below shows the output of above code.
Now let us use this data set code and calculate the mean of each variable in the data set 'cars'.
PROC MEANS DATA = sashelp.CARS Mean SUM MAXDEC=2; Run;
Image below shows mean of all the variables in the data set upto two decimals.
Mean Of Selected Variables
By supplying the names in the Var option you can get the mean of the specified variables. Please refer the code below.
PROC MEANS DATA = sashelp.CARS mean SUM MAXDEC=2; var horsepower cylinders; Run;
Mean By Class
You can find the mean of the numeric variables by organizing them into groups by using some parameter to group them. Consider following sample code. Lets find out the mean of horsepower for different groups categorized by the classes 'make' and 'type' of different cars.
PROC MEANS DATA = sashelp.CARS MEANS SUM MAXDEC=2; class make type; var horsepower; Run;
The image below shows the output of the above code.
Let us continue with our SAS Programming blog and take a look at another important statistical concept.
Standard Deviation
Standard deviation (SD) is a measure of how varied is the data in a given data set. Mathematically, it tells you how close is each data point to the mean value of a data set. If the value of standard deviation is close to 0, it indicates that the data points are very close to the mean of the data set and a high standard deviation indicates that the data points are spread out over a wide range of values.
In SAS, you can calculate the value of Standard Deviation using two procedures. They are:
- PROC MEANS
- SURVEYMEANS
Standard Deviation Using PROC MEANS
You can measure the Standard Deviation using proc means, you have to choose the STD option in the PROC step. It will display the Standard Deviation values for each numeric variable in the data set.
Syntax:
PROC MEANS DATA = dataset STD;
Consider this sample code, let us create another data set CARS1 from the CARS data set in the SASHELP library. To do this we let us use PROC SQL procedure. Let us group the data using 'type' and 'make' of cars and calculate standard deviation for selected variables using the STD option with the PROC means step.
PROC SQL; create table CARS1 as SELECT make,type,horsepower,cylinders,weight FROM SASHELP.CARS WHERE make in ('Audi','BMW') ; RUN; PROC MEANS DATA=CARS1 STD; Run;
The above code will give Standard deviation for selected variables. Following image displays the output.
PROC SURVEYMEANS
This procedure is used to measure Standard Deviation along with some advance features like measuring Standard Deviation for categorical variables and the variance.
Syntax:
PROC SURVEYMEANS options statistic-keywords; By variables; Class variables; Var variables;
Following is the description of the parameters used:
- By is used to indicate the variables used to create groups of observations.
- Class indicates the variables used for categorical variables.
- Var indicates the variables for which SD will be calculated.
Let us take a look at this sample code which describes the use of the class parameter, that creates the statistics for each of the values in the class variable.
PROC SURVEYMEANS DATA=CARS1 STD; Class type; Var type horsepower; ods output statistics=rectangle; Run; PROC PRINT DATA=rectangle; Run;
The images below shows the output of the code above. It shows distribution of data for variable 'Horsepower' for 95% confidence interval. (Confidence interval means a range of values so defined that there is a specified probability that the value of a parameter lies within it.)
So, that brings us to the end of SAS programming blog. For any doubt or issue with the content of the blog, please leave them in the comments section, I will solve them at the earliest and respond back.
If you wish to learn SAS and build a career in the analytics domain, then check out our SAS Training & Certification which comes with instructor-led live training and real-life project experience. This training will help you understand SAS in depth and help you master various concepts of SAS programming language.
Got a question for us? Please mention it in the comments section and we will get back to you.
Source: https://www.edureka.co/blog/sas-programming/
0 Response to "Simple and Easy Explanation of Sas Software Programs"
Post a Comment