Python Gems: 5 Functions and Methods That Make Data Doable!
05/06/2022 by Drew Fergus Support
This article is written for those who need to solve data problems in Python, but may not have a lot of experience doing so. It is also written as a place for those who may have more experience to drop “the odd pearl” in the LinkedIn comment section. One of the best parts of working with “open source” technologies is becoming part of a community of users who come together to build something that we couldn’t build on our own. Of course, no single developer has the time to do it all. But thanks to the many hands and brilliant minds that are constantly at work, we have an ever-expanding toolbox and some pretty slick ways of dealing with data using the Python programming language.
When discussing how to use Python to solve data or data science related problems, it almost goes without saying that some of the functions and methods will come from Pandas, Numpy, or other essential libraries designed to help make data happen. Since many functions or methods from different libraries can have the same name, I’ll be sure to make a note of the library attached to each of the “Gems” I share—links to docs preferred—and I hope you’ll do the same when sharing one of your Gems in the LinkedIn comment section.
While I’ve never actually skinned a cat, there are reportedly many ways to do it. Even though Python prefers having one obvious way to do something, different circumstances could mean your way of solving one of the below Use Cases might be best. Feel free to solve one of my Use Cases differently or add your own use case for the same Gem—some can have many applications! This article is intended for a broad audience, including those new to Python or who may only use it on occasion, so when sharing, please remember that “Simple is better than complex.”
Every Python Gem below has four brief components:
Use Case – The problem or need
The Gem – The function/method and library (if applicable)
Description – Brief description of what the Gem does
Implementation – The way the Gem works to solve the problem
The Gem — DataFrame.copy()
Description — You guessed it. It makes a copy of the object.
Implementation — Make a copy of the object (or part of the object) you’re working with to alter it and not the old/original object.
df_copy = df.copy()
The Gem — numpy.nansum()
Description — Converts null values to zero while summing
Implementation — Use numpy.nansum() to ensure null values don’t stop you from
getting the right sums in your analyses.
a = np.nansum(b)
instead of “.values”, which is less efficient.
The Gem — pandas.Series.to_numpy()
Description — Converts a series like a data frame’s column to a numpy array
Implementation — Use column.to_numpy() to operate on the column/series with the
speed benefits of numpy!
b = df[‘A’].to_numpy()
to line up for operations can be a chore and often leads to errors.
The Gem — DataFrame.astype()
Description — Converts data types from series or data frames to a desired data type
Implementation — There are a number of ways to use this Gem to keep your sanity, but
a simple df.astype() can convert the data frame data to an integer for some operation.
df[‘integer’] = df[‘float’].astype(int)
The Gem — numpy.where()
Description — Sets values in data based on conditions
Implementation — This Gem can take a little longer to learn to use because it is very
versatile and powerful, in my humble opinion. Am I allowed to write that, or does it
require “IMHO”? At any rate, here is a simple example, and I will restate it in English for
those who are newer to Python:
df[‘C’] = np.where((df[‘A’] < df[‘B’]), df[‘A’], df[‘B’])
For every row where Column “A” is less than Column “B”, then Column “C” is given the
value of Column “A”; otherwise, it is set to the value of Column “B”.
If these “Gems” are new to you, then enjoy getting to know them. If nothing here is new, then head to the Linkedin comments and drop some knowledge. Happy coding, and good luck with the data!