The SAS deployment process can be challenging for some organizations due to a range of potential bottlenecks and questions throughout the planning, installation, and configuration stages. This post contains some keys for guiding your SAS installation strategy. Our goal is to provide the keys to a successful SAS deployment so that SAS installers can form a clear plan within their organization.
Information provided in this post comes from lessons learned across many installations we have planned and performed for our customers.
Planning Your SAS9 Hardware Architecture
Before installing SAS, it is a good idea to plan hardware architecture to get the most out of your SAS environment. Of course, when buying servers with massive amounts of memory and the fastest processors available, your system is going to make processes run better. In our experience, architecting an environment that maximizes input/output (IO) throughput and concurrency is an essential low-level design task either overlooked or not given enough scrutiny.
At the lowest level, the faster the system can write data to disk, the more efficient things can run overall. The local area network can also be a potential bottleneck for performance, especially with network mounted storage, clustered nodes, or multi-tier environments passing a lot of data around.
In our experience, the most common performance bottleneck, which is harder to solve for a traditional SAS computing platform, is the disk IO layer and related environment architecture design. CPU and memory sizing is also important but is typically easier to improve with proper sizing and configuration.
The following sections cover key areas to consider when designing the hardware architecture to support a high-performing, flexible SAS computing environment.
Define Logical Storage Requirements Early
You should understand data access requirements for SAS users, then design the hardware to meet those needs. If you plan the physical hardware first, it may be overkill or not enough, depending on the actual SAS user requirements. When we discuss the requirements with customers, these are the topics we cover:
Understand IO Usage
Rather than a high volume of small IO transactions, SAS users typically submit a lower volume of high IO transactions. SAS9 performs these operations in large chunks versus small bits and bytes. Refer to the Best Practices for Configuring your IO Subsystem for SAS9 Applications document for the best guidance on planning your SAS environment from an IO perspective.
Avoid IO Contention by Separating High Disk Traffic Areas
Plan the logical file system to keep critical areas of the environment from competing for the disk. While defining disk partitions may be slightly beneficial and cost-effective for smaller organizations, having the flexibility to spread IO throughput across multiple IO channels and disk spindles are going to be the best option. For context, using an RDBMS or other external data storage option would accomplish the same goal of separating IO traffic.
To ensure data flows without contention across separate IO paths, use the following list to define independent logical areas. Virtual environments add another layer to consider though – be sure to understand underlying virtual disk storage. Plus remember – separate virtual disks can be on the same SAN or physical disk.
- Operating System
The heart of the server, the operating system should have it’s own mount point to avoid contention and ensure the server will always operate, even if other disks are full.
- SAS Application Executables (SASHOME)
SAS can be installed on the same mount point as the operating system, or have it’s own. We recommend keeping third-party applications on separate storage devices for consistency. This practice is common with virtual environments that have virtual images of environments with standardized file system layouts.
- SAS Config (SASCONFIG)
Optionally the SAS configuration directory can be set up on a mount point to streamline disaster recovery practices (if applicable). By default, the SAS configuration directory contains solution-specific files which define the SAS environment, SAS metadata, log files, and other environment specific application data. This area is continually changing as log files grow, environment settings change, metadata grows, and solution-specific data changes (if applicable). Beginning in SAS 9.4, the OEM web application server is installed within SASCONFIG by default.
- SAS WORK Library (WORK)
The WORK library is used in every single SAS session, sometimes heavily. Data stored in the WORK library do not persist across SAS sessions. ETL jobs benefit the most from a fast SAS WORK location. If budget permits, use the highest tier storage available for SAS WORK. Solid state is our preference because it can drastically improve performance both read and write performance, but can have a higher dollar figure and lower disk life.
Use the WORK system option to specify where SAS can define the WORK library. Depending on the size and domain of the organization, we recommend anything from 200GB to 1TB of space just for SAS WORK.
- SAS Utility (UTILLOC)
SAS threaded procedures use the SAS utility directory, but not as frequently as the WORK library. It is not an uncommon practice to segment traffic to this location by mounting it separately. You can use the UTILLOC system option to define a new location for SAS utility files – the default is WORK.
- Permanent Data Storage
Going back to the previous point about thinking through the logical organization of data in the environment consider whether data should be stored locally as SAS datasets, in a structured relational database such as PostgreSQL or Oracle, or an unstructured HDFS filesystem, such as Hadoop. If the data is physically on the same compute tier server as SAS, be sure to define a separate mount point to prevent IO contention with other processes. External storage options naturally avoid IO contention but are limited by the speed of the network connection or the processing power of the physical machine, which runs the external storage solution.
- User Data Storage
If power users need storage for their own permanent or semi-permanent storage needs, you can define a separate mount point for that as well. We recommend an independent area for user storage when power users want to analyze data and store their own versions of SAS datasets or other flat files. This prevents users from filling up storage space which may impact other important areas of the environment. This practice also segments the IO throughput to avoid contention with more important processes.
In a Unix/Linux environment, we typically use an explicit mount for /home to accomplish this. From there you can customize special SAS libraries for each power user to read and write data to their own home directory. If your power users are really hungry for analyzing and moving around their data, you could also define a separate workspace server tuned just for power users.
Other Useful References
- This document provides additional configuration options for power users within SAS Enterprise Guide: How to Redirect WORK and SASUSER for Enterprise Guide
- Be sure to provide multi-path IO channels for heavily used mount points. An IT system administrator or hardware expert is definitely needed for this task.
- Windows Features that Optimize Performance (Windows OS only)
- Configuration and Tuning Guidelines for SAS9 in Microsoft Windows Server 2008 (Windows OS only)
Minimize the Number of Network Hops Between Servers
Multi-tier environments benefit from having a dedicated switch (either virtual or physical) to streamline communications between each machine. A clustered environment, such as Hadoop or SAS Grid, should definitely have a dedicated switch to organize the cluster of server nodes. An experienced IT network engineer is needed for this task.
Use a Network Card (NIC) Bonding
Along the same theme of maximizing concurrency, bonding multiple network cards using link aggregation provides higher throughput and better reliability. Bonding network cards would be done prior to any SAS installation, most likely when the hardware is initially provisioned.
Improve Performance by Reducing IO Contention and Improving Concurrent Operations
The common theme of this post is reducing contention by improving concurrent operations, either at the lowest level on hard disks, at the network layer, or at higher logical layers. Each consideration is equally important assuming it is applicable to your environment. It is generally a good practice to have experts across all areas collaborate together to design the best architecture for your SAS environment.