Python Won’t Solve All the World’s Problems, but It May Solve Yours
Modernization - Analytics
We had written a web service backend to a User Interface (UI) that accepted a csv form as input and performed a series of checks on the data in the form to determine if it was suitable for input in the system. Unfortunately, this backend web service was written using a proprietary tool that had several limitations:
- It was not well-documented
- It was slow (the UI itself was not responsive when writing or editing the job)
- Like a Visio diagram, to make a job one had to select from pre-cooked functions and arrange them in a flow which made details of the underlying logic hidden behind each box’s properties instead of readily visible
- It was not well-documented (did I already say that?)
Due to being built in a clunky, proprietary, point-and-click interface, the legacy solution was unwieldy to develop and maintain. Even though the UI presented the job as a Visio-like flow, it was actually quite hard to follow what was going. Unfortunately, all the salient logic was hidden in the properties of each component’s box. For example, I may know that one box sorts a database table and that another box joins two tables, but without looking at the properties, I won’t know which columns are being sorted on or what the join criteria is. So, the important logic was always buried, and this made debugging and troubleshooting painful and time-consuming.
I had just accepted this as my lot in life until my boss, Ben Zenick, asked me one day what it would take to rewrite this in Python. My immediate response was, “Could I?”
Even though I hadn’t written a solution like this in Python before, I was heartened by the simple fact that I am not the first person to write a webservice to take input from a user, do something, and return a JSON object. Of course, this is common, and many programmers before me have already figured out best practices and created packages to accomplish this. I didn’t need to reinvent the wheel. I just needed to find some already fabricated wheels and customize them to my ends.
Toward that end, I set about googling all the basic components that I would need. In Python, several packages often could accomplish a needed goal, so my guideline was to use projects that seemed mature, stable, popular, and had many code examples. If several packages met that criteria, I just flipped a coin, or chose the one with the most amusing name. Eventually, I settled on the following:
- psycopg2 – for interfacing with the Postgres database
- pandas – for data transformations and manipulation (it has some functionality which, although possible in plain SQL, would be much more tedious to accomplish)
- logging – for customized log messages and creating log files
- Flask – for setting up the program as a REST API callable from the UI
- Gunicorn – a production grade Web Service Gateway interface that can serve flask applications (although flask comes with a server, it is only recommended for development purposes, not for production deployments)
Armed with these Python packages, I set about rewriting the legacy solution in Python. As a programmer, I found it freeing to have all the features of a well-documented and open-source programming language at my fingertips. If I didn’t know how to do something, a quick Google search could remedy that.
The Benefits of Modernizing to Python
It was a painstaking process extracting the salient logic out of the legacy solution and translating that into Python code, but it was ultimately worthwhile because I noticed some improvements that could drastically increase run time. So, the Python-based solution actually runs noticeably faster than the legacy solution, which was an unexpected but welcome benefit.
As an added aside, I’d like to sing the praises of the humble logging package. Of course, log messages are only as good as you make them, but I found it important to take the time to write good and verbose ones as it will save time on troubleshooting in the future when unforeseen issues arise. In summary, the benefits I found from pythonifying the legacy solution are:
- Readability: salient logic is available at a glance and not obscured behind UI elements
- Maintainability: code was easier to maintain because I could write functions for repeatable tasks (code reuse wasn’t possible in the legacy solution)
- Simplicity: Python’s syntax is meant to be simple and readable
- Logging: a mechanism to create your own logs is very useful
- Troubleshooting: with helpful log messages and the Python debugger, troubleshooting is a lot easier
So, pythonifying the cumbersome legacy solution didn’t solve all the world’s problems, but it did solve several of mine. So, if you have any solution written in an obscure language or tool, and you think to yourself, “Should I rewrite this in Python?” My answer is a resounding, “YES!”