This blog post can also be run as a Jupyter Notebook and requires the following Python dependencies:

  • altair
  • numpy
  • pandas
import numpy as np
import pandas as pd
import altair as alt

You can access the data used for our analysis through the footnote 1. Let's read in the csv file as a pandas dataframe.

df = pd.read_csv('data/nfl_fantasy_2019.csv')
df.head()
Unnamed: 0 Player Tm Pos Age G GS Cmp Att Yds ... FumblesLost PassingYds PassingTD PassingAtt RushingYds RushingTD RushingAtt ReceivingYds ReceivingTD FantasyPoints
0 0 Christian McCaffrey CAR RB 23.0 16.0 16.0 0.0 2.0 0.0 ... 0.0 0.0 0.0 2.0 1387.0 15.0 287.0 1005.0 4.0 469.20
1 1 Lamar Jackson BAL QB 22.0 15.0 15.0 265.0 401.0 3127.0 ... 2.0 3127.0 36.0 401.0 1206.0 7.0 176.0 0.0 0.0 415.68
2 2 Derrick Henry TEN RB 25.0 15.0 15.0 0.0 0.0 0.0 ... 3.0 0.0 0.0 0.0 1540.0 16.0 303.0 206.0 2.0 294.60
3 3 Aaron Jones GNB RB 25.0 16.0 16.0 0.0 0.0 0.0 ... 2.0 0.0 0.0 0.0 1084.0 16.0 236.0 474.0 3.0 314.80
4 4 Ezekiel Elliott DAL RB 24.0 16.0 16.0 0.0 0.0 0.0 ... 2.0 0.0 0.0 0.0 1357.0 12.0 301.0 420.0 2.0 311.70

5 rows × 28 columns

Some columns are redundant while others don't have the best names. Let's get that fixed!

# Drop redundant columns
df.drop(['Unnamed: 0', 'Att', 'Yds', 'Att.1', 'Yds.1', 'Yds.2'], axis=1, inplace=True)
# Rename columns
df.rename(columns = {'Tm': 'Team', 'Pos': 'Position', 'G': 'Games'}, inplace=True)
# Print the updated columns
print(df.columns)
Index(['Player', 'Team', 'Position', 'Age', 'Games', 'GS', 'Cmp', 'Int', 'Tgt',
       'Rec', 'Y/R', 'Fumbles', 'FumblesLost', 'PassingYds', 'PassingTD',
       'PassingAtt', 'RushingYds', 'RushingTD', 'RushingAtt', 'ReceivingYds',
       'ReceivingTD', 'FantasyPoints'],
      dtype='object')

Numbers in most of the columns have decimals and are being treated as floating points. To save some memory and make things simpler, let's convert them to integers.

# Convert float to int
for row in df.columns:
         if row not in ['Player', 'Team', 'Position', 'Fantasy Points']:
             df[row] = df[row].astype('int32')
df.head(1)
Player Team Position Age Games GS Cmp Int Tgt Rec ... FumblesLost PassingYds PassingTD PassingAtt RushingYds RushingTD RushingAtt ReceivingYds ReceivingTD FantasyPoints
0 Christian McCaffrey CAR RB 23 16 16 0 0 142 116 ... 0 0 0 2 1387 15 287 1005 4 469

1 rows × 22 columns

We want to focus on running backs; let's filter them out and store this data in a new pandas dataframe.

rb_df = df.loc[df['Position'] == 'RB', ['Player', 'Team', 'Age', 'Games', 'GS', 'RushingAtt', 'Tgt', 'Rec', 'RushingYds', 'ReceivingYds', 'RushingTD', 'ReceivingTD', 'FantasyPoints']]
print(f'Total number of Running Backs: {len(rb_df)}')
Total number of Running Backs: 153

After cleaning and filtering the data, we can begin defining new metrics that will help us compare the running backs.

# collapse
rb_df['Usage'] = rb_df['RushingAtt'] + rb_df['Tgt']
rb_df['Yds'] = rb_df['RushingYds'] + rb_df['ReceivingYds']
rb_df['TD'] = rb_df['RushingTD'] + rb_df['ReceivingTD']
rb_df['Rank'] = rb_df['FantasyPoints'].rank(ascending=False).astype('int32')
rb_df.sort_values('Rank', inplace=True)
rb_df.head(3)
Player Team Age Games GS RushingAtt Tgt Rec RushingYds ReceivingYds RushingTD ReceivingTD FantasyPoints Usage Yds TD Rank
0 Christian McCaffrey CAR 23 16 16 287 142 116 1387 1005 15 4 469 429 2392 19 1
3 Aaron Jones GNB 25 16 16 236 68 49 1084 474 16 3 314 304 1558 19 2
4 Ezekiel Elliott DAL 24 16 16 301 71 54 1357 420 12 2 311 372 1777 14 3

Time for some interactive charts!

#collapse
chart = alt.Chart(rb_df, title='Running Back Usage vs. Production - 2019 NFL Season').mark_point().encode(
    alt.X('Usage', title='Usage (Rushing Attempts + Targets)'),
    alt.Y('FantasyPoints', title='Fantasy Points Scored'),
    tooltip = ['Player', 'Rank', 'Age', 'Yds', 'TD', 'Usage', 'FantasyPoints']
).interactive()

chart = chart.configure_title(
    fontSize=18,
).configure_axis(
    titleFontSize = 14,
    titleFontWeight = 500
)

chart.properties(
    width=700
)

There's an obvious positive correlation between these two quantities. Let's put a number on that relationship.

rb_df['FantasyPoints'].corr(rb_df['Usage'])
0.9610105031989995