This blog post can also be run as a Jupyter Notebook and requires the following Python dependencies:

altair
numpy
pandas

import numpy as np
import pandas as pd
import altair as alt

You can access the data used for our analysis through the footnote ¹. Let's read in the csv file as a pandas dataframe.

df = pd.read_csv('data/nfl_fantasy_2019.csv')
df.head()

Some columns are redundant while others don't have the best names. Let's get that fixed!

# Drop redundant columns
df.drop(['Unnamed: 0', 'Att', 'Yds', 'Att.1', 'Yds.1', 'Yds.2'], axis=1, inplace=True)
# Rename columns
df.rename(columns = {'Tm': 'Team', 'Pos': 'Position', 'G': 'Games'}, inplace=True)
# Print the updated columns
print(df.columns)

Index(['Player', 'Team', 'Position', 'Age', 'Games', 'GS', 'Cmp', 'Int', 'Tgt',
       'Rec', 'Y/R', 'Fumbles', 'FumblesLost', 'PassingYds', 'PassingTD',
       'PassingAtt', 'RushingYds', 'RushingTD', 'RushingAtt', 'ReceivingYds',
       'ReceivingTD', 'FantasyPoints'],
      dtype='object')

Numbers in most of the columns have decimals and are being treated as floating points. To save some memory and make things simpler, let's convert them to integers.

# Convert float to int
for row in df.columns:
         if row not in ['Player', 'Team', 'Position', 'Fantasy Points']:
             df[row] = df[row].astype('int32')
df.head(1)

We want to focus on running backs; let's filter them out and store this data in a new pandas dataframe.

rb_df = df.loc[df['Position'] == 'RB', ['Player', 'Team', 'Age', 'Games', 'GS', 'RushingAtt', 'Tgt', 'Rec', 'RushingYds', 'ReceivingYds', 'RushingTD', 'ReceivingTD', 'FantasyPoints']]
print(f'Total number of Running Backs: {len(rb_df)}')

Total number of Running Backs: 153

After cleaning and filtering the data, we can begin defining new metrics that will help us compare the running backs.

# collapse
rb_df['Usage'] = rb_df['RushingAtt'] + rb_df['Tgt']
rb_df['Yds'] = rb_df['RushingYds'] + rb_df['ReceivingYds']
rb_df['TD'] = rb_df['RushingTD'] + rb_df['ReceivingTD']
rb_df['Rank'] = rb_df['FantasyPoints'].rank(ascending=False).astype('int32')
rb_df.sort_values('Rank', inplace=True)
rb_df.head(3)

Time for some interactive charts!

#collapse
chart = alt.Chart(rb_df, title='Running Back Usage vs. Production - 2019 NFL Season').mark_point().encode(
    alt.X('Usage', title='Usage (Rushing Attempts + Targets)'),
    alt.Y('FantasyPoints', title='Fantasy Points Scored'),
    tooltip = ['Player', 'Rank', 'Age', 'Yds', 'TD', 'Usage', 'FantasyPoints']
).interactive()

chart = chart.configure_title(
    fontSize=18,
).configure_axis(
    titleFontSize = 14,
    titleFontWeight = 500
)

chart.properties(
    width=700
)

There's an obvious positive correlation between these two quantities. Let's put a number on that relationship.

rb_df['FantasyPoints'].corr(rb_df['Usage'])

0.9610105031989995

1. CSV file. Data Source: Fantasy Football Data Pros ↩

	Unnamed: 0	Player	Tm	Pos	Age	G	GS	Cmp	Att	Yds	...	FumblesLost	PassingYds	PassingTD	PassingAtt	RushingYds	RushingTD	RushingAtt	ReceivingYds	ReceivingTD	FantasyPoints
0	0	Christian McCaffrey	CAR	RB	23.0	16.0	16.0	0.0	2.0	0.0	...	0.0	0.0	0.0	2.0	1387.0	15.0	287.0	1005.0	4.0	469.20
1	1	Lamar Jackson	BAL	QB	22.0	15.0	15.0	265.0	401.0	3127.0	...	2.0	3127.0	36.0	401.0	1206.0	7.0	176.0	0.0	0.0	415.68
2	2	Derrick Henry	TEN	RB	25.0	15.0	15.0	0.0	0.0	0.0	...	3.0	0.0	0.0	0.0	1540.0	16.0	303.0	206.0	2.0	294.60
3	3	Aaron Jones	GNB	RB	25.0	16.0	16.0	0.0	0.0	0.0	...	2.0	0.0	0.0	0.0	1084.0	16.0	236.0	474.0	3.0	314.80
4	4	Ezekiel Elliott	DAL	RB	24.0	16.0	16.0	0.0	0.0	0.0	...	2.0	0.0	0.0	0.0	1357.0	12.0	301.0	420.0	2.0	311.70

	Player	Team	Age	Games	GS	RushingAtt	Tgt	Rec	RushingYds	ReceivingYds	RushingTD	ReceivingTD	FantasyPoints	Usage	Yds	TD	Rank
0	Christian McCaffrey	CAR	23	16	16	287	142	116	1387	1005	15	4	469	429	2392	19	1
3	Aaron Jones	GNB	25	16	16	236	68	49	1084	474	16	3	314	304	1558	19	2
4	Ezekiel Elliott	DAL	24	16	16	301	71	54	1357	420	12	2	311	372	1777	14	3