Zencos has merged with Executive Information Systems (EIS)
twitter-icon
Free Strategy Consultation
Zencos Icon

Services

Contact Us

Blog

Python Gems: 5 Functions and Methods That Make Data Doable!

Data Management

Drew Fergus

05/06/2022

Hero

This article is written for those who need to solve data problems in Python, but may not have a lot of experience doing so. It is also written as a place for those who may have more experience to drop “the odd pearl” in the LinkedIn comment section. One of the best parts of working with “open source” technologies is becoming part of a community of users who come together to build something that we couldn’t build on our own. Of course, no single developer has the time to do it all. But thanks to the many hands and brilliant minds that are constantly at work, we have an ever-expanding toolbox and some pretty slick ways of dealing with data using the Python programming language.

When discussing how to use Python to solve data or data science related problems, it almost goes without saying that some of the functions and methods will come from Pandas, Numpy, or other essential libraries designed to help make data happen. Since many functions or methods from different libraries can have the same name, I’ll be sure to make a note of the library attached to each of the “Gems” I share—links to docs preferred—and I hope you’ll do the same when sharing one of your Gems in the LinkedIn comment section.

While I’ve never actually skinned a cat, there are reportedly many ways to do it. Even though Python prefers having one obvious way to do something, different circumstances could mean your way of solving one of the below Use Cases might be best. Feel free to solve one of my Use Cases differently or add your own use case for the same Gem—some can have many applications! This article is intended for a broad audience, including those new to Python or who may only use it on occasion, so when sharing, please remember that “Simple is better than complex.”

Every Python Gem below has four brief components:

Use Case – The problem or need

The Gem – The function/method and library (if applicable)

Description – Brief description of what the Gem does

Implementation – The way the Gem works to solve the problem

 

  1. Use Case — The dreaded SettingWithCopyWarning (Cue scary music and/or evil

 laugh).

The Gem DataFrame.copy()

Description — You guessed it. It makes a copy of the object.

Implementation — Make a copy of the object (or part of the object) you’re working with to alter it and not the old/original object.

 

df_copy = df.copy()

 

  1. Use Case— Add up columns, etc., without getting thrown off by null or NaN entries.

The Gem numpy.nansum()

Description — Converts null values to zero while summing

Implementation — Use numpy.nansum() to ensure null values don’t stop you from

getting the right sums in your analyses.

 

a = np.nansum(b)

 

  1. Use Case— Create efficient computations on series or columns using numpy arrays

instead of “.values”, which is less efficient.

The Gem pandas.Series.to_numpy()

Description — Converts a series like a data frame’s column to a numpy array

Implementation — Use column.to_numpy()  to operate on the column/series with the

speed benefits of numpy!

 

b = df[‘A’].to_numpy()

 

  1. Use Case — It’s always the data type! Ok, not always, but quite often, getting data types

to line up for operations can be a chore and often leads to errors.

The Gem DataFrame.astype()

Description — Converts data types from series or data frames to a desired data type

Implementation — There are a number of ways to use this Gem to keep your sanity, but

a simple df.astype() can convert the data frame data to an integer for some operation.

            

df[‘integer’] = df[‘float’].astype(int)

 

  1. Use Case — Quickly and easily change, update, or alter data based on conditions

The Gem numpy.where()

Description — Sets values in data based on conditions

Implementation — This Gem can take a little longer to learn to use because it is very

versatile and powerful, in my humble opinion. Am I allowed to write that, or does it

require “IMHO”? At any rate, here is a simple example, and I will restate it in English for

those who are newer to Python:

 

df[‘C’] = np.where((df[‘A’] < df[‘B’]), df[‘A’], df[‘B’])

 

 For every row where Column “A” is less than Column “B”, then Column “C” is given the 

value of Column “A”; otherwise, it is set to the value of Column “B”.

 

If these “Gems” are new to you, then enjoy getting to know them. If nothing here is new, then head to the Linkedin comments and drop some knowledge. Happy coding, and good luck with the data!

Related Insights

Blog

Why ZenGuard Is the Right Choice to Carry Your Analytics Environment into the Future

WhitePaper

Know the Steps Toward Securing a Data-Driven Mindset