blog

Modernizing Your SAS Analytics Platform with Containers

08/01/2019 by Ben Zenick Ivan Gomez Michael Koob Modernization - Analytics

When struggling to meet user needs, administrators are rapidly adopting containers as a quick solution. In August 2018, Gartner reported that 59% of IT departments were planning to deploy containers within the next two years. 19% of the IT companies indicated they had already implemented this technology. Clearly, IT shops are already seeing value in this technology.

Container technology is becoming a favorite way to solve the complex issue of managing individual environments with multiple dependencies. The software and its dependencies are packed together in an executable package that can run anywhere.

Analysts attribute the popularity of containers to flexibility and reduced costs. Completing complex analysis with large data footprints requires specific environments; administrators want to maximize the efficiency of managing the application lifecycle.

An equally compelling benefit is the reduced cost. With SAS Viya scheduled to move into containers, the business is going to be excited about the idea also. When it is done right, deployment and maintenance efforts are reduced significantly.

What is a Container?

Containers are a lightweight, stand-alone packaged application that has the SAS software and its dependencies bundled together. Often compared to shipping containers, it has everything it needs within its space, which allows the cargo ship to move the container around quickly. Like cargo containers, which have consistent sizes to facilitate moving around the ship, containers present a consistent interface to IT infrastructure allowing it to be moved around quickly. All while keeping the contents safe and secure.  IT quietly orchestrates moving the containers about as needed to meet business needs.

Why Use SAS Analytics in Containers?

With a SAS container, IT can skip the deployment steps that sometimes last for months and tie up multiple internal and external resources. Some IT departments are choosing to obtain containerized applications from certified container repository or registries, such as Red Hat Container Registry.  These repositories make the applications immediately available to their users and provide scalability, availability options, thus reducing costs.

With the reduced deployment costs, companies who may have considered SAS to be too expensive can now afford it. They don’t need a data center anymore. You can rapidly deploy to a cloud using containers.

Containers are Flexible

You can configure the container as elaborate or as simple as desired. Many administrators like to use a micro-services approach where they install a database in one container than install SAS in another container and so on. This modular approach allows the administrator to complete upgrades or make other changes to individual containers without rebuilding the entire system. It is quick and easy to support user needs.

Containers are Lightweight

Containers are compact because they share the operating system kernel. Thus, more containers can fit on a single host. Unlike virtual machines that must have the operating system started, the container applications are immediately accessible.

Both administrators and analysts appreciate not waiting for their application to start. Analysts like that SAS is always available. Administrators understand that the containers can start instantly, serve their purpose, and vanish just as quickly. This process frees up system resources for other containers.

Containers are Portable

Properly configured containers bundle applications with their dependencies allowing them to be more easily deployed to different infrastructure. It no longer matters if the code was developed in a test environment. It can quickly be moved to a production environment without worrying about underlying dependencies that complicate conventional application upgrades. Upgrades become much easier, and with proper architecture, continuous upgrading and patching become feasible.

This portability extends to more than development environments. Containers can be moved from physical servers to virtual machines or even between a private and public cloud. All without the hassle of ensuring the environment has the right dependencies available. Your code works no matter where it is installed. Availability and scalability options, which were difficult or impossible previously, become straightforward. When resources become tight, additional resources can be provisioned in real-time – bursting capacity from resources available, from on-premise or public cloud infrastructure.

Containers and Version Control

A container represents a version of an application – what is known as an image. This image can be put under source control in a private or publicly hosted container image repository. Together with the image and repository act as a version-control mechanism for the application stack. This combination of containers and image repositories provides for easy rollback and tight tracking of changes between versions. It also allows for older versions of the application to be maintained with a previously unattainable level of ease.

This feature also benefits clinical and research organizations. Those organizations often need to maintain the ability to run previous versions and configurations based on regulatory requirements. Regulated organizations must be able to return to environments where the data was represented in a certain way. Easy maintenance of specific application versions offered by containers is a significant advantage in the situation such companies find themselves.

These benefits are some of the many convincing reasons that IT administrators are finding to migrate to containerized environments.

Comparing Containers to Existing Platform Solutions

containers vs virtual machines vs servers

Even if you are sold on the benefits, you may not understand how the technology works. Let’s compare the container solution to two more common analytic platform solutions: dedicated servers and virtual machines.

Solution One: Using Dedicated Servers

This solution uses fixed hardware and operating system environment for the application. The server has an operating system, associated drivers, and multiple applications installed, such as a database, a SAS compute tier, and so on. For many years, this was the only solution available.

The applications are available for use but offer low scalability. This solution is often inflexible since any upgrades can impair all the applications running on the server – having the highest downtime requirements of solutions discussed. Many administrators find this clumsy and time-consuming to maintain compared with alternatives.

This solution provides the highest degree of separation from other applications but has the highest cost of deployment and administration.

Solution Two: Using Virtual Machines

Many administrators think containers are just a virtual machine. It’s a fair likeness but not complete. It’s true that like virtual machines, multiple containers can be hosted on a single physical server, provide separation of context, and are portable.

The difference is that virtual machines are not light-weight. Virtual machines are composed of the entire operating environment and dependencies along with the software applications. If you are housing several virtual machines on the same server, you can imagine the excessive resource usage from the duplicate operating systems.

Like bare-metal installations, running upgrades or installing additional software requires taking the application hosted on a virtual machine offline. This also can result in dependencies that the varied applications may not be able to tolerate on the same virtual machine. This solution allows increasing resource utilization at the hardware layer by running multiple virtual machines on large physical hardware but adds an administrative layer in the VM Hypervisor. It is much more flexible and is a significant reduction in administrative effort compared to dedicated hardware.

Solution Three: Containers

Properly deployed containers can effectively remove the dependency of applications from their underlying hardware/network infrastructure. Imagine if your site could license the SAS applications made available via a certified container.

This reduces deployment effort to only configuring the container to orient it to your enterprise infrastructure. It also unlocks scalability/availability options and provides increased resource flexibility and utilization. Organizations realize through the use of Kubernetes orchestration of containers on their hardware infrastructure. This orchestration offers a convenient path to hybrid cloud architectures.

Even though we are contrasting virtual machines to containers, you shouldn’t think of these as competing technologies. Containers run equally well on virtual machines and physical machines.

All of the above contribute to reduced cost of ownership of containerized applications.

Using Containers to the SAS World

When creating a container environment, the software stack has some new players. Some of the more notable names in the container community are Docker, Kubernetes, and OpenShift, and Red Hat. Let’s review the role of each in the stack.

Container Host

Linux has primarily been the operating system of choice for a containerized environment. The overall direction of hosts has been towards “atomic” operating systems (e.g., Project Atomic) because they offer significant benefits when combined with container-based applications.

Updates to an atomic host operating system can be downloaded and deployed in one step through a single command, which encourages simple update and rollback of changes. When an image is updated, the previous version is retained for rollback, but there is no mixing of two versions. This dramatically reduces administration at the operating system level of the container stack. The increased consistency of host operating system images and easy of rollbacks provides much higher reliability of the system. This, in turn, increases confidence in applying the necessary changes to keep systems up-to-date with rapidly evolving requirements of today’s IT organizations.

Container API

Docker, the most widely used container engine, manages the application and its dependencies. Docker is part of the Open Container Initiative (OCI) run by the Linux Foundation. The OCI goal is to have an industry standard for container formats and runtime software across all platforms. This standard ensures that containers are standardized. OCI sponsors include many large corporations such as Amazon, Google, IBM, Microsoft, and Red Hat.

Orchestration

Kubernetes (provided by Google) is an open-source container management system that has mechanisms for deploying, maintaining, and scaling containerized applications. Since Kubernetes is an open-source technology, vendors are entering this space to provide supported versions of the technology and add their special features on top. OpenShift (provided by Red Hat) is one to watch closely as it is rapidly establishing itself as the preferred solution for managing Docker-based container infrastructure.

Containerized Services

The SAS application becomes a service in the stack supported by Docker Hub. Analysts use a browser-based interface, such as SAS Studio or Jupyter Notebook, to access the SAS platform and to complete their work. Properly architected and deployed container basis of the application is mostly invisible to users. SAS has made multiple products available for containerized solutions. This includes SAS 9.4 with SAS Studio and batch processing capabilities. SAS Viya has numerous options, including stand-alone programming environments (SPRE), MPP CAS, as well as the full visual interfaces of the SAS Viya application suite (Github sassoftware/sas-container-recipes).

Containers will change the way analytic platforms are deployed. The issues associated with producing virtual machines or provisioning specialized, dedicated servers are rapidly disappearing.

Deployment and administration of SAS applications are going to look radically different in a few short years – empowering both business and IT to use SAS more effectively and at a lower total cost of ownership.