blog

Using SAS Viya to Decode the Star Wars Fan Buzz

12/14/2017 by Chris St. Jeor Modernization - Analytics

At Zencos we are big fans of SAS Viya, Python and Star Wars. The pending release of Star Wars: The Last Jedi in Dec 2017, presented an excellent opportunity to use our favorite tools to learn how fans are feeling in the build-up to our favorite holiday blockbuster.

An Assesment of Star Wars Fan Posts

Judging from the volume of Reddit posts and tweets involving Star Wars we know that fans are abuzz about this new movie. However, days before the release there is still a neutral vibe among the fans. We sense that fans will have a more definitive response after seeing the movie, hopefully of a positive nature!

Despite knowing that Star Wars fan are not chattering with passionate emotion, we were still interested in what they are discussing. Popular topics such as discussing canon or the trailer were expected.

We were most surprised to observe some discussion tying Jar Jar Binks, Darth Vader, and General Snoke together. While holding the general opinion that all things prequel should be avoided, we are charmed to know some fans still discuss such theories. The release of Battlefront II is also getting its fair share of hype.

Use the Force, err, SAS Viya

The Cloud Analytical Services Engine is at the center of it all, the Death Star of SAS Viya. Python SWAT is the Python package needed to connect to SAS Viya from Python, and Jupyter Hub is our selected Python programming interface. SAS Studio and SAS Visual Analytics round out the fleet. Used effectively together, these four tools are more effective than the rebel’s Red Squadron.

In this use case, collecting data from a variety of sources leads to better results, especially because we are gauging the reaction of a diverse fan base on social media. SAS Viya allows us to seamlessly leverage both its built-in social media connections and also to load data that we collect from APIs via Python directly to CAS.

Collecting Data from an API

Using an API to harvest data is much easier than harvesting moisture on Tatooine. Application Programming Interfaces (APIs) are predefined methods of communication and tools for both interacting with and building software applications.

You can connect to APIs from Python to collect publicly available data. APIs often present their data in JavaScript Object Notation (JSON). JSON is built on a collection of name/value pairs or an object containing a set of values. This data-interchange format is easy to parse with Python.

Most major websites have APIs, and often there are Python wrappers, delivered as packages, specifically developed to make interacting with them increasingly easy. We used the Reddit API, which allows users to collect posts and comments from Reddit and specific subreddits.

Python PRAW is the API wrapper for Reddit’s API. Python PRAW provides us with functions built specifically for parsing the JSON that Reddit’s API outputs, further easing the data collection process.

A connection must be established to the API and later to CAS to load data. Each API has a specific connection requirement; in this case, we had to register with Reddit.

SAS Viya and Python Rule

We think that together (SAS Viya with Python) is the analytical toolset equivalent to the Millennium Falcon. If the Millennium Falcon can make the Kessel Run in 12 parsecs then the opportunities are truly astounding with this toolset.

 …..