blog

Get The Most from SAS Cloud Analytic Services

03/02/2020 by Duane Ressler

The SAS Cloud Analytics Services (CAS) is an in-memory analytical engine designed to deliver top performance when analyzing in-memory tables. The service is designed to use memory efficiently while maintaining top analytical performance.

What’s CAS Cache for?

While the goal is for all CAS operations to take place in memory (RAM), at times a table or problem set is so large that it can’t fit into available memory. Instead of failing to load the table or failing to complete an analytics work stream, CAS will move your data to disk. That’s where the CASCACHE configuration comes into play.

CAS uses the cache to move blocks of data onto disk when available memory is too small to hold the data or to completely solve a problem set. Because the cache is used to map memory to disk, I/O speed of the storage is extremely important to maintain adequate performance of the cache. That means that direct attached solid-state drives (SSD) or non-volatile memory express (NVMe) disks are the best storage option for the cache. Single disks and network file systems typically cannot meet the I/O throughput requirements and should be avoided.

While it is possible to use the Unix /tmp directory as cache storage as a short-term solution, it isn’t an approach that we recommend. Because /tmp is typically 1 GB or less in size, it is too small to be an effective cache. Also, if the /tmp directory fills up, the operating system will become unstable.

Another option is to use /dev/shm (shared memory) for the cache. However, if the server runs out of RAM, shared memory will be swapped out to the OS paging file. Memory swapping will degrade CAS performance to an unacceptable level.

The CASCACHE location is configured by value of CAS_DISK_CACHE in the vars.yml file prior to deploying your SAS Viya software. You should refer to SAS Viya 3.5 for Linux: Deployment Guide for more information on configuring the cache location.

How Fast Should the Cache be?

The target I/O throughput for the cache is 100 to 150 megabytes per second per CPU core. This throughput rate should be validated using rhel_iotest.sh, which is supplied by SAS or a similar tool to accurately measure the sustained throughput rate. It’s important to validate the cache performance prior to deploying your SAS Viya environment because it’s often difficult to improve the cache throughput rate after deployment. We recommend keeping a record of the cache I/O throughput since this can be helpful in diagnosing CAS-related performance issues.

Your available cache storage should be at least two times the amount of RAM on the server. The physical disks for the cache should be configured as either a hardware or software RAID for easier storage management. Because the data in the cache is ephemeral, a RAID 0 configuration is sufficient since mirrored redundancy is not needed. Using a mirrored RAID configuration can reduce the I/O throughput of the cache.

Scaling Out is Better Than Using Cache

It’s important to remember that CAS works best when it can keep tables and results in memory rather than using the cache. You can use the SAS Environment Manager in SAS Viya to monitor the memory use of servers in your deployment and identify when your environment is using cache storage.

If you find that your SAS Viya deployment is regularly running out of RAM and is making significant use of the cache, you should seriously consider adding additional CAS worker machines to your SAS Viya deployment to keep it running at peak performance.