Developers, Analysts & Data Scientists: Enhance Your Toolbox with New Open Source Integrations
09/06/2019 by Sean Ankenbruck
The introduction of SAS Viya opened the SAS brand and its industry-leading machine learning algorithms to the open source community. With the added ability to connect open source interfaces such as Lua, Python, and R, SAS Viya becomes a powerful tool to include in any analytical toolbox. Developers, analysts, and data scientists no longer have to choose one tool over the other but can instead use multiple in collaborative ways to meet all their business needs.
Let’s explore several ways that today’s analytics professionals are integrating open source solutions into their existing SAS Viya infrastructure. Using the examples that follow, you’ll be able to do the same at your company!
SAS Viya is a cloud-focused, in-memory analytics engine. It is built around the SAS Cloud Analytic Services (CAS). A cloud-based, run-time environment that provides potent and efficient analytics capabilities.
The large number of products offered means that there is something for every company in every industry. Whether you are a Midwest dairy producer or a worldwide manufacturing company, SAS Viya solutions can be implemented to meet your most demanding business needs.
Plus SAS brings years of experience in the predictive analytics market. By offering system integration with open source software, there are many reasons to integrate these platforms.
Developers can quickly and efficiently access a large number of resources made available via the APIs. With relatively little effort, it is possible to implement these resources into existing web applications.
In the following example, we will demonstrate how easy it is to plug these APIs into an existing Flask application. To access these resources, you must register the custom application as a trusted client with the SAS Logon Manager.
This registration ensures that SAS Viya accepts requests from our application. It also provides an added layer of security to prevent unauthorized requests from other applications.
We can specify limited access to secure resources within SAS Viya. SAS Logon Manager uses Oauth 2.0 as an authorization framework. This enables our application to exchange valid credentials with the SASLogon service for a valid access token.
Once a token has been retrieved, we can then use it in subsequent requests to access a protected resource. For this example, we are accessing the resource:
This resource returns all the assigned groups that the currently authenticated user belongs to. It might be useful to access this resource if we wanted to check whether or not a specified user has the correct group assignments to retrieve data from another resource within our application. For example, we only want to allow users who belong to the group Administrators to have elevated privileges within the application.
SAS Viya APIs are flexible and can be fine-tuned for any use case that your company might have. These could include but are not limited to providing real-time access to reports and dashboards, structuring and analyzing raw text from customer feedback data, managing data within CAS, running production model code and scoring new data automatically based on some event. The application of these APIs is truly endless.
SASPy is a module that can be used by Python developers to connect to their existing SAS infrastructure. It allows users to run their analysis using both SAS and Python code simultaneously. Basic statistical methods are available to users right out of the box. More complex modeling can be accomplished by using one of the module’s built-in Python classes.
If you are not familiar with Python, it is an open source tool that can be used for data science projects. System requirements for Python and SAS to support SASPy are available on the Anaconda site. It is not difficult to get open source projects started.
In the following sample code, we’ve imported both the SASPy and Pandas libraries into a Jupyter notebook. From there we can connect to our SAS workspace server using the saspy.SASsession() function.
This function starts a SAS session that enables users to submit SAS code and access data within the file system. We can load data from the sample dataset, sashelp.cars, and return summary statistics using the describe() function.
If you are a more familiar with the SAS programming language than Python, then you can submit SAS code directly in the notebook via the magic function, %%SAS.
Magic functions are shortcuts that extend a Jupyter notebook’s capabilities. These functions allow users to call functions that wouldn’t necessarily be available if you were writing explicit Python code.
In the following example, we invoke the %%SAS magic function to uses PROC SGPANEL plot data in Jupyter. This method is the same way that we would do it in SAS Enterprise Guide or SAS Studio. The resulting histogram depicts the manufacturer’s suggested retail price (MSRP) of all cars in the SASHELP.CARS dataset grouped by origin.
As we have seen, the SASPy module enables your analytics professionals to run Python and SAS code seamlessly in one environment. What an excellent solution for companies looking to integrate open source with their existing SAS software. [Tip! If you want to learn SAS, check out these free training resources.]
The SAS Scripting Wrapper for Analytics Transfer (SWAT) package is the Python client to SAS Viya Cloud Analytic Services (CAS). The Python SWAT package provides direct access to CAS using standard Python conventions and data structures. The package allows you to load data into memory, apply CAS actions, summarize, model, and score data all from within Python.
In this example, we import Pandas and the SWAT library and then create a connection to CAS using the swat.CAS() function. Once a session has been started, we can load the same SASHELP.CARS dataset.
This time we make use of the powerful Pandas library to convert the dataset into a DataFrame for further analysis. A key feature of SWAT is that it allows your Python users to do what they do best — code in Python! When you need to access data or submit an action, SWAT is there to act as the gateway to CAS.
Once the data has been converted to a DataFrame, you can apply any of the statistical modeling or visualization tools that you frequently use. In this example, we used the popular visualization package, Seaborn, to plot MSRP by Origin.
The categorical scatter plot is perfect for this example because we are comparing a continuous variable against a categorical one. We’ve even added an additional parameter, Type, as a color code. This helps us gain additional insight into how the cost is broken down across the various types of vehicles in the dataset.
The format and structure of data and functions while using the SWAT package is like that of the popular Pandas library. For this reason, those who are experienced Python programmers should find that it is relatively easy to use. Connect to CAS, load data, and you are good to go!
It is important to note that the Python modules above do not contain SAS software or the associated SAS license files. All the above examples assume that SAS software has already been configured and installed at your site. Specific system and version requirements can be found in the respective documentation for each module.
These are three different ways that developers can integrate SAS Viya with Python. The business impact of these new methodologies is astounding. It allows more developers, analysts, and data scientists from various industries to integrate world-class analytical software into applications written in their language of choice.
There is no longer a need to choose one language over another. Instead, it is possible to harness the relative strengths of each language and combine the results in ways that were previously not possible.
If you would like specifics of how you can apply these techniques at your company, please contact us.