Exploratory Data Analysis of Home Team Advantage in the NBA 2004–2020

James Duan
10 min readMar 8, 2021

--

Topic: Home court advantage has long been a topic of discussion for the NBA. In fact, past betting statistics have shown that the home team will win around 60% of the time. Overall, teams tend to score more, shoot higher percentages, get more assists and rebounds, commit less fouls, and have less turnovers when playing at home. However, with the COVID 19 pandemic, the NBA Bubble has essentially eliminated home court advantage. Without the pressure of an audience, it seems that the playing field has been evened. The bubble has seen many role players outperform and many historically bad away teams shine. The LA Lakers have always been a solid playoff team that struggled on the road (43.4% win rate). This season, however, without the home team advantages of a regular NBA season, the Lakers are winning a whopping 72.2% of their ‘away’ games. Many of the external factors (ie. crowd size, referee bias, jetlag, etc.) influencing home team advantage have already been analyzed. For this project, I will be examining the in-game statistics to visualize the contribution of each statistic to higher home team win percentages.

Dataset: I used a dataset which documented NBA games from 2004 to 2020, which contained statistics for each game including: points (PTS), assists (AST), rebounds (REB), field goal percentage (FG%), 3-point field goal percentage (FG3%). These statistics were separated by home team and away team. My analysis of home court advantage was focused on these five statistics. Stats like the number of fouls and turnovers were not considered because they have already been attributed to factors such as referee bias, audience pressure, and away team jetlag. The data initially contained 99 rows of missing data, which was removed. Additionally, an outlier which only had 36 points was removed (the game was cancelled due to condensation on the court). Finally, additionally columns were added to the dataset to assist in further analysis: Point differential, assist differential, rebound differential, FG% differential, FG3% differential, as well as %difference of these differentials. Link to dataset can be found here. Link to Github code can be found here.

Guiding Questions: How are these statistics influencing the outcome of the games? How do the differences between home and away team statistics influence home court advantage? Which of these statistics contributes the most the home court advantage?

Key Assumptions: For simplicity of analysis, we are assuming that other variables, such as fouls and turnovers do not contribute to home court advantage. Moreover, we are assuming that home win rate is significantly higher than away win rate because external factors make it so that the home team has higher values in AST, REB, FG%, and FG3%. Therefore, these 4 key variables are constitute ‘home court advantage.’

Summary Statistics

Looking at summary statistics, it seems that the home team has, on average, higher PTS, AST, REB, FG%, and FG3%. Looking at the average win rate of the home team, the data set from 2004–2020 shows that the home team won 59.35% of the time.

Comparing Variables to Points Scored

First, I plotted AST, REB, FG%, and FG3% against points scored for both the home and away teams to better understand how these variables correlate to overall scoring. FG% had the highest correlation with scoring, as it had a correlation coefficient (r) of 0.67, while rebounds had the least correlation to points with an r of 0.16. Assists and FG3% both had moderately positive correlation to points scored, with r values of 0.58 and 0.41 respectively. From this scatter plot, it seems that FG% has the most influence on how many points a team is scoring, which makes sense because a higher shooting percentage means that a team is going to score at a higher rate. However, the scatter plot of FG% seems to show heteroscedasticity, as the variation of error seems to increase as FG% rises. This is likely because high scoring games are less common with lower FG% when compared to higher FG%. Thus, I would hesitate to use FG% as a predictor of points scored. Nevertheless, the correlation is there, and it does signify some relationship between these variables and scoring.

Comparing Difference in Variables to Difference in Points Scored

The correlation between these variables and PTS is important to highlight its influence on scoring, but it does not tell us how it impacts the outcome of the game. Thus, I proceeded to plot the difference between the home and away team in terms of these variable against the difference between the points scored by the home and away team. The following scatter plot clearly illustrates how the difference in a statistic (ie. AST, REB, FG%, FG3%) is correlated to the difference in points scored. When comparing point differentials with FG% differentials, there is a strong positive correlation (r = 0.77). Moreover, by comparing FG% differentials instead of FG%, we have eliminated the problem of heteroscedasticity. Interestingly, while REB did not have a significant correlation to points scored, REB differential has a moderately positive relationship with the difference in points scored (r = 0.45). Both AST difference and FG3% difference saw some increase in correlation too.

Average Win Percentage by Difference in Assists

Now, I will proceed to examine the Win Percentage of home and away teams based on the difference in assists of the two teams. This is important to see how important assists are to wins, and we will be comparing it to the base home win rate of 59.35%. In games where the home and away teams had the same number of assists, the home team is only winning 54.19% of the time. This makes sense because intuitively, more assists leads directly to more points. As such, if home and away teams are putting up the same amount of assists, this is mitigating one aspect of home court advantage. Presumably, other factors (such as REB and FG%) are accounting for the 4.19% additional win percentage for the home team — in comparison to the expected value of win percentage of every team in the league (50%). The logic of this comparison is that if AST constituted all 9.35% of what I will ‘home court advantage,’ then games where AST for home and away teams are equal should result in a 50% win rate.

Average Win Percentage by Difference in Rebounds

When examining the win percentage of the home team by the difference in rebounds, we see an interesting development. When the home team and away team have the same number of rebounds, the average home team win percentage is 60%, which is actually 0.65% higher than the overall home team win percentage. That is to say, when we simulate the composition of the other variables for home court advantage (AST, FG%, FG3%) by examining games where the number of rebounds by the home and away teams are the same, the contribution of the other variables actually shows an increase in the home team win percentage. This suggests that rebounds are not a very impactful competent of home court advantage. In fact, the home team won more than 50% even when they had between 1 to 5 less rebounds than the away team.

Average Win Percentage by Differences in FG% and FG3%

Looking at FG% differential, we can see that when FG% is the same for both teams, the home team is winning 57.84% of the time, which is 1.51% lower than our the average win percentage of home teams. Similar to AST differentials, it seems that FG% is a significant component of home court advantage. That is to say, without the higher FG% that comes with home court advantage, the home team is going to lose 1.51% more games. While this may not be a completely accurate predictor, it does make sense intuitively. The home team practices more often at their home court, giving them a FG% boost that helps them outscore the away team more often than not.

A brief glance at the bar charts reveals a similar situation for FG3% differential and REB differential. For games where the FG3% was the same for both teams, the home team win percentage was 62.25%. Additionally, even when the home team shot FG3% between 1% and 5% worse than the away team, the home team was still winning 55.31% of the time. Again, this suggests that FG3% is not a important variable for home court advantage.

This conclusion, however, might be slightly misleading. That is because 3 point field goals were not as impactful on game outcomes during the 2000s and early 2010s — which makes up a large portion of our data set. This trend is illustrated in the following line plot:

There is a downward trend in win rate when teams are shooting worse than their opponents from the 3 point line. This indicates that FG3% might be more important for home court advantage in more recent NBA seasons. As such, the following bar graph only includes data from the 2013 season and beyond to adjust for this factor:

As you can see, when adjusted for the more recent importance of 3 point shooting, the win percentage when the home and away teams shoot the same FG3% is 55.73%, which is 3.62% below the average home win percentage. Compared to the 1.51% decrease in win rate when FG% is the same, it seems that 3FG has become a more important contributor to home court advantage in recent NBA seasons.

Drawbacks and Limitations of Prior Analysis

One major limitation of the analysis thus far is the limited amount of data points when statistic differentials are zero. For AST and REB, we are only able to see around 1000 data points when there is the two stats are the same, respectively. This is worse for FG%, where there is only 185 data points where the two teams shoot the same percentage. FG3 can also be misleading because there are many games where the FG3% is zero, resulting in an infinite difference between the FG3% of the two teams. Lastly, I have only considered 4 variables when in reality there might be many more which account for home court advantage.

Conclusions

Based on our prior analysis, we can conclude that REB has little contribution to homecourt advantage. Under the assumption that REB has no impact on homecourt advantage, we can set REB equal between home and away teams to see the constitution of homecourt advantage based on the other variables. In order to get more data points, however, I used +/- 3% REB Difference. This minimizes the REB difference while giving me more data points for higher accuracy data. Additionally, to account for games where the away team had a 0 FG3 — which resulted in infinite percentage difference in FG3 between home and away — any FG3% percentage difference that was larger than 100% was adjusted down to 100%. Moreover, only data from 2013–2020 was considered to adjust for the increased importance of FG3%. The average home win rate from 2013–2020 was 57.60%, and the home team win rate for games where REB was similar in these seasons was very similar at 57.75%. The resulting percentage difference in AST, FG%, and FG3% was graphed.

Based on the chart, AST seems to have the largest contribution to home court advantage, followed closely by FG3% and then FG%. I can calculate the weighted contribution of each stat to home court advantage by dividing the percentage by the total percentage change. The results are as follows: AST weighted contribution: 44.45%, FG% weighted contribution: 14.36, FG3% Weighted Contribution: 41.19%. This is consistent with our previous analysis, where of the 4 variables, the average home win rate was the lowest when AST difference was zero. Based on the analysis, we can conclude that AST and FG3% are the biggest factors for home court advantage, followed by FG%, then REB. This makes sense for the modern NBA where three pointers have become a decisive factor for winning a game. However, I would have expected FG% to be second. Intuitively, the home team should have an FG% advantage because they are shooting on the court that they are familiar on. One caveat of this conclusion is that research has shown that home team statisticians who are responsible for recording game statistics tend to credit the home team with more AST. This is because home team statisticians tend to be biased in the sense that they are more lenient when crediting AST to the home team. In this sense, AST might be a more artificial variable when it comes to home court advantage.

--

--