Running Backs in the NFL
Digging into statistics from the previous NFL (National Football League) season in order to compare running backs across the 32 teams. This is an initial draft, much more to follow!
This blog post can also be run as a Jupyter Notebook and requires the following Python dependencies:
altair
numpy
pandas
import numpy as np
import pandas as pd
import altair as alt
You can access the data used for our analysis through the footnote 1. Let's read in the csv file as a pandas dataframe.
df = pd.read_csv('data/nfl_fantasy_2019.csv')
df.head()
Some columns are redundant while others don't have the best names. Let's get that fixed!
# Drop redundant columns
df.drop(['Unnamed: 0', 'Att', 'Yds', 'Att.1', 'Yds.1', 'Yds.2'], axis=1, inplace=True)
# Rename columns
df.rename(columns = {'Tm': 'Team', 'Pos': 'Position', 'G': 'Games'}, inplace=True)
# Print the updated columns
print(df.columns)
Numbers in most of the columns have decimals and are being treated as floating points. To save some memory and make things simpler, let's convert them to integers.
# Convert float to int
for row in df.columns:
if row not in ['Player', 'Team', 'Position', 'Fantasy Points']:
df[row] = df[row].astype('int32')
df.head(1)
We want to focus on running backs; let's filter them out and store this data in a new pandas dataframe.
rb_df = df.loc[df['Position'] == 'RB', ['Player', 'Team', 'Age', 'Games', 'GS', 'RushingAtt', 'Tgt', 'Rec', 'RushingYds', 'ReceivingYds', 'RushingTD', 'ReceivingTD', 'FantasyPoints']]
print(f'Total number of Running Backs: {len(rb_df)}')
After cleaning and filtering the data, we can begin defining new metrics that will help us compare the running backs.
# collapse
rb_df['Usage'] = rb_df['RushingAtt'] + rb_df['Tgt']
rb_df['Yds'] = rb_df['RushingYds'] + rb_df['ReceivingYds']
rb_df['TD'] = rb_df['RushingTD'] + rb_df['ReceivingTD']
rb_df['Rank'] = rb_df['FantasyPoints'].rank(ascending=False).astype('int32')
rb_df.sort_values('Rank', inplace=True)
rb_df.head(3)
Time for some interactive charts!
#collapse
chart = alt.Chart(rb_df, title='Running Back Usage vs. Production - 2019 NFL Season').mark_point().encode(
alt.X('Usage', title='Usage (Rushing Attempts + Targets)'),
alt.Y('FantasyPoints', title='Fantasy Points Scored'),
tooltip = ['Player', 'Rank', 'Age', 'Yds', 'TD', 'Usage', 'FantasyPoints']
).interactive()
chart = chart.configure_title(
fontSize=18,
).configure_axis(
titleFontSize = 14,
titleFontWeight = 500
)
chart.properties(
width=700
)
There's an obvious positive correlation between these two quantities. Let's put a number on that relationship.
rb_df['FantasyPoints'].corr(rb_df['Usage'])
1. CSV file. Data Source: Fantasy Football Data Pros↩