Exploring data using Python (Seaborn and Matplotlab)

Aditi Mukerjee
4 min readNov 15, 2020

Introduction

In this blog, I explored the dataset by Next Gen Stats tracking data for running NFL plays.

The code for the “NFL plays” data has been made public on my Github here.

Data Set

For this blog, I have used the Kaggle data set — NFL Big Data Bowl.

The dataset contains 65k rows and 48 columns of NFL data. Each row in the file corresponds to a single player’s involvement in a single play. The dataset was intentionally joined (i.e. denormalized) to make the API simple. All the columns are contained in one large data frame which is grouped and provided by PlayId.

Data Visualization

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

Relationship between week into the season and speed in yards/second while considering playing at ‘home’ and ‘away’.

plt.figure(figsize=(14,6))
sns.lmplot(x='Week', y= 'S', hue='Team', data=df)
plt.ylabel('speed in yards/second')
plt.xlabel('week into the season')
plt.ylim(0, 10)

The speed is pretty constant. However, it rises to the highest level into weeks 5 and 12 during the away and home games respectively.

Are the acceleration and speed constant throughout the seasons?

plt.figure(figsize=(14,6))
plt.title("Acceleration and speed range for NFL Big Bowl for seasons 2017-2019")
sns.lineplot(data=df['A'], label='Acceleration in yards/second^2')
sns.lineplot(data=df['S'], label='Speed in yards/second')
plt.xlabel('Seasons in NFL season')
plt.ylim(0, 5)

The highest acceleration and speed were achieved in the 2018 season. Other than that both the variables are the same throughout the seasons.

What is the player's weight relationship?

plt.figure(figsize=(14,10))
plt.title("Player Weight for NFL Big Bowl for seasons 2017-2019")
sns.distplot(a=df['PlayerWeight'],kde=False)
plt.xlabel('Player weight (lbs)')

The weight of players varies from 160 lbs to 350 lbs. It generally keeps around 175, 225, and 325 lbs for low, medium, and heavy built individuals.

Temperature variations for various seasons

plt.figure(figsize=(14,10))
plt.title("Player Weight for NFL Big Bowl for seasons 2017-2019")
sns.distplot(a=df['PlayerWeight'],kde=False)
plt.xlabel('Player weight (lbs)')

The temperature was a bit higher in the 2019 season but other than that the temperature remains approximately constant throughout the other two seasons.

Is the performance any better during home games?

plt.figure(figsize=(8,8))
plt.title("The yardage gained on the play depending on whether we played at home or away")
sns.scatterplot(y=df['Yards'],x=df['Team'])

The performance is approximately the same depending no matter whether they placed at home or away.

Relationship between Humidity and Temperature

sns.jointplot(x=df['Temperature'], y=df['Humidity'], kind="kde")
#plt.title("Temperature vs Humidity")
#plt.xlabel("Temperature (deg F)")
plt.figure(figsize=(14,10))

This wonderful two-dimensional KDE plot shows the variation between Humidity and Temperature during the playoff season.

Yards vs Week into the season

sns.lmplot(x='Week', y="Yards", hue ='Quarter', data=df)
plt.xlim(0, 20)
plt.title("Yards vs Week into the season")

The yards increased around week 15 into the season. The highest yards were in quarters 2,3 and 4. The lowest yeards were in quarter 5.

Yardline range

sns.kdeplot(data=df['YardLine'], shade=True)
plt.title("Yardline range")

The yardline ranges between 0 to 50. The yardline usually lines mostly around 28.

Conclusions

  • The highest acceleration and speed were achieved in the 2018 season. Other than that both the variables are the same throughout the seasons.
  • The temperature was a bit higher in the 2019 season.
  • The weight of players varies from 160 lbs to 350 lbs. It generally keeps around 175, 225, and 325 lbs for low, medium, and heavy built individuals.
  • The yardline ranges between 0 to 50. The yardline usually lines mostly around 28.
  • The performance is approximately the same depending no matter whether they placed at home or away.

If you have any questions or comments or need any further clarifications please don’t hesitate to contact me at aditimukerjee33@gmail.com or reach me at 403–671–7296. If you are interested in collaborating on any project, feel free to reach out to me without any hesitation.

If you enjoyed this story, please click the 👏 button and share to help others find it! Feel free to leave a comment below.

--

--

Aditi Mukerjee

Engineer. Data Analyst. Machine Learning enthusiast