A SERVICE OF

logo

Performance Considerations Sorting DBMS Data 31
Sorting DBMS Data
Sorting DBMS data can be resource intensive, whether you use the SORT procedure,
a BY statement, or an ORDER BY clause on a DBMS data source or in the SQL
procedure’s SELECT statement. Sort data only when it is needed for your program.
The following list contains guidelines for sorting data:
If you specify a BY statement in a DATA or PROC step that references a DBMS
data source, it is recommended for performance reasons that you associate the BY
variable (in a DATA or PROC step) with an indexed DBMS column. If you
reference DBMS data in a SAS program and the program includes a BY statement
for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS
LIBNAME engine automatically generates an ORDER BY clause for that variable.
The ORDER BY clause causes the DBMS to sort the data before the DATA or
PROC step uses the data in a SAS program. If the DBMS table is very large, this
sorting can adversely affect your performance. Use a BY variable that is based on
an indexed DBMS column in order to reduce this negative impact.
The outermost BY or ORDER BY clause overrides any embedded BY or ORDER
BY clauses, including those specified by the DBCONDITION= option, those
specified in a WHERE clause, and those in the selection criteria in a view
descriptor. In the following example, the EXEC_EMPLOYEES data set includes a
BY statement that sorts the data by the variable SENIORITY. However, when that
data set is used in the following PROC SQL query, the data is ordered by the
SALARY column and not by SENIORITY.
libname mydblib oracle user=testuser password=testpass;
data exec_employees;
set mydblib.staff (keep=lname fname idnum);
by seniority;
where salary >= 150000;
run;
proc sql;
select * from exec_employees
order by salary;
Do not use PROC SORT to sort data from SAS back into the DBMS, because this
impedes performance and has no effect on the order of the data.
The database does not guarantee sort stability when using PROC SORT. Sort
stability means that the ordering of the observations in the BY statement is
exactly the same every time the sort is run against static data. If you absolutely
require sort stability, you must place your database data into a SAS dataset, and
then use PROC SORT.
When you use PROC SORT, be aware that the sort rules for SAS and for your
DBMS may be different. Use the base SAS system option SORTPGM to specify
which rules (host, SAS, or DBMS) are applied:
SORTPGM=BEST
sorts data according to the DBMS sort rules, then the host sort rules, and
then the SAS sort rules. (Sorting uses the first available and pertinent
sorting algorithm in this list.) This is the default.
SORTPGM=HOST
sorts data according to host rules and then SAS rules. (Sorting uses the first
available and pertinent sorting algorithm in this list.)
SORTPGM=SAS