Predicting whether or not a player performs exceptionally in a EPL Football Match

Shail Mirpuri
Aug 12, 2020
9 min read

Updated: Sep 9, 2020

Source: Liverpool Echo

Overview

As a fantasy football fanatic and data enthusiast, I have always pondered what impacts a player’s performance in the best football league in the world: The English Premier League (EPL). Does a player’s recent performances or favor with the public dictate whether they perform well in a given match? Is an individual player more likely to perform exceptionally if he is in a higher-quality team? Does home advantage have a significant impact on a player’s performance? Using a Kaggle dataset consisting of match level data on the Fantasy Premier League (FPL) performances of players during the 19/20 EPL season, I have attempted to answer these questions by developing a logistic regression model in Google’s BigQuery.

In order to make a fair comparison between these different variables, we need to exclude FPL match data from a player, who has played less than 45 minutes in a given match. This is because a player is unlikely to perform exceptionally if he has played less than 45 minutes since he will not have enough time to make an impact. Therefore, we need to exclude this data from our analysis as failure to do so will lead to inaccurate representations of average key metrics.

How do we measure if a player has performed ‘exceptionally’ ?

First, we need to define what it means to have an exceptional performance. In FPL, there are bonus points given to the top 3 players in each game. Using this metric, I have made the assumption that if a player has received bonus points, he is classified as an ‘exceptional performer’ for that given match. With this as our basis, I have processed the data such that if a player is given bonus points, he will get a ‘1’ in ‘Performed Exceptionally’ column (and if not he will be awarded a 0).

We shall test the assumption above by comparing key metrics between those who performed exceptionally in a match and those who did not. The key metrics we will be analysing to validate our assumption are those that are often indicative of an exceptional performance in football. For instance we would expect a player who performs exceptionally in a match to be more likely to score than other players. Therefore, this analysis will allow us to observe whether Bonus Points are an accurate reflection of whether a player performs exceptionally or not. In order to fairly compare the key metrics between the two groups, we need to look at the average of each.

Figure 1: Bar Graph Comparing Key Performance Metrics between the two groups

Figure 1 suggests that Bonus Points on FPL are a good indicator of a player’s performance in a given match. It illustrates that those who are grouped into the ‘Performed Exceptionally’ category have a significantly greater average goals scored, assists, clean sheets and saves. Contrastingly, this group has a lower average yellow cards and average goals conceded metric. Based upon prior knowledge, this is what we would expect, which indicates that FPL Bonus Points are indeed a good measure of exceptional performance.

Next, we shall investigate other factors that do not explicitly impact whether or not an individual performs exceptionally. One factor in our dataset worth investigating is the price/value of the player as set in FPL. The price of a player is largely based on his reputation and his performances in the previous season. Although the price is set at the start of the season, it can change incrementally with limits based upon his recent performances. Intuitively, we would expect that higher priced players are more likely to perform exceptionally since they are likely to be of a better quality.

Another factor worth investigating is the Average Transfer Balance. This is the total number of transfers in minus transfers out for a particular gameweek. This may potentially reflect the public’s opinion on a player as well as the player’s recent performances. Finally we will also compare the Influence-Creativity-Threat (ICT) Index between the two groups. This index is measured by the FPL website itself and takes into account how many chances a player creates, their influence on the game and threat in front of goal.

Table 1: Value, ICT and Transfer Comparison Between the Two Groups

Table 1 shows that performances considered to be ‘exceptional’ on average have a greater FPL price/value, ICT index and Transfer Balance for that corresponding week. While intuitively this is expected, it is interesting to observe that there is only a slight disparity in the average FPL price/value between the two groups. On the other hand, the Average ICT Index and Transfer Balance are more than double in the ‘performed exceptionally’ group. This suggests that Average ICT Index and Transfer Balance are stronger predictors of an exceptional performance than FPL price/value.

Another important factor that may impact whether or not a player has performed exceptionally well is the quality of the team they play for. We can investigate this by looking at the breakdown of the ‘Performed Exceptionally’ group by the different teams the performances occurred in. In order to judge the quality of each team objectively, we shall look at the position they finished in this season’s (19/20) EPL table and plot this against the percentage of total performances from each team that were classified as exceptional.

Figure 2: Bar Graph of Percentage of Total Performances Classified as Exceptional vs. Final Position in the EPL

From Figure 2, we can see that as the quality of the team decreases so does the percentage of total performances classified as exceptional for that team. It is important to note that this relationship is not strictly negative. For instance, Wolves, who finished 7th in the EPL, have a greater percentage of total exceptional performances than Manchester United, who finished 3rd. Despite this, there is an overall downward trend, which suggests that performances are more likely to be classified as exceptional if they occur in a higher quality team. From Figure 2 we can also see that Liverpool, who finished 1st in the EPL, have a significantly greater percentage of total performances classified as exceptional than the rest of pact. This is testament to their dominance in the 19/20 EPL season, in which they won the league by a rampant 17 points.

Finally, we will also investigate the variable of home advantage. We shall attempt to find this out by calculating the percentage of total home/away performances classified as exceptional.

Table 2: Home Advantage and Exceptional Performances

According to table 2, there was a slighter greater percentage of total performances classified as exceptional when there was a home advantage. However, this small percentage difference may imply that its relative impact on performance may be less than other previously discussed factors.

In order to confirm our interpretations as well as compare the relative importance of each factor, we shall develop a logistic regression model using BigQuery Machine Learning. Since we have 8493 rows in our processed dataset (after excluding those who played less than 45 minutes in a match), we will split it into 80% for training the model and 20% for evaluating the model. This gives us a random sample of 6794 performances to train our model on and 1699 performances to perform an evaluation on. In addition to this, when creating our model we will balance class labels using weights that are inversely proportional to frequency of each group. This will be done in order to ensure that the model doesn’t learn to predict the most popular group (those who don’t perform exceptionally).

Figure 3: Performance Metrics for Our Trained Model

Figure 3 illustrates the key performance metrics for our model based upon the training data. From this we can observe that the key metric of Area Under the Curve is closer to 1 than to 0.5, which suggests that our model has pretty good certainty. In addition to this, we also have to try and minimize the F1 score at our given threshold. It seems that the best threshold to maximize both precision and recall is 0.6630. Now, we shall use this threshold in the evaluation of our model by applying it to our test dataset in order to ensure that the model has not overfitted to the training data.

Table 3: Key Performance Metrics for the Testing Dataset

In the table above, we can see that there are similar performance metrics to those shown in our training model. The AUC and F1 only differ slightly between the two datasets. This suggests that our model is a good fit for judging whether a player's performance is classified as exceptional.

Using our model, we can predict whether or not a player performed exceptionally in a match based upon their weekly ICT index, price, transfer balance, team and home advantage. To demonstrate the predictive capabilities of the model, I will use it to predict 4 case study performances throughout the season:

Bruno Fernandes vs Everton (01/03/2020)
Declan Rice vs Tottenham (24/06/2020)
Ismaïla Sarr vs Liverpool (01/03/2020)
Michail Antonio vs Norwich (11/07/2020)

I have chosen these specific performances as I believe that all of them represent unique scenarios. I am interested in seeing the model’s prediction and assessment of each scenario. The first case study (Bruno Fernandes) is used to represent a new signing in the EPL. Next, we will observe how the model assesses the performance of a player (Declan Rice) in a losing cause (0-2). After this, we shall move on to the assessment of a shock performance in a shock result, in which Ismaïla Sarr and his Watford team beat table-toppers Liverpool (3-0). Finally, we will observe the model’s prediction for players who have good recent form through the case of Michail Antonio prior to his 4 goals against Norwich.

Table 4: Using the model to predict performances

From table 4, the incredible predictive capabilities of our logistic regression model can be observed. In all 4 case studies, all performances were classified by FPL as ‘exceptional’. However, by our model Declan Rice’s performance vs Tottenham was classified as non-exceptional based upon the other variables. This may suggest that our model may not be suitable in handling scenarios where a player plays well but his team loses. It is also important to note that it is unrealistic to expect our model to predict every single case with complete accuracy.

Finally we can query the weights of each variable in the model in order to observe which variables are the strongest and weakest.

Table 5: The Top 5 Most Influential Variables

Table 6: The Weakest 5 Variables

Table 5 shows the importance of the ICT index as a predictor of a good individual performance; this suggests the more chances, influence and threat a player carries in a match, the more likely they are to play exceptionally. Apart from the ICT index, the rest of the top 5 variables are based upon the team played for by a player. Since Aston Villa, Norwich and Bournemouth were all in relegation battles this season, it is no surprise that their players’ performances were less likely to be classified as exceptional. On the other hand, it is a shock to find 2nd place Manchester City as one of the top negatively correlated weights. This, however, may provide insight that their team as whole and their individual players under performed throughout the season.

In Table 6, we can see that the FPL set value or price of a player had a very small say on whether or not the player performed exceptionally in a match. Since the FPL price is based upon reputation and the player’s performance in the previous season, it can be said that in the EPL a player’s reputation and longer term form has very little impact on whether or not he performs exceptionally in a given match. This may stand testament to the competitive nature of the EPL, where any player can perform and upset any team on a given day. In addition to this, a low negative weight is given to home advantage, which suggests that this may actually be a myth. This means that players are almost equally likely to perform exceptionally whether or not they are playing at home. Finally, transfer balance also has a relatively weak weighting. Since the transfer balance is affected by the general public’s opinion of the player and expectation of his future performances, this suggests that EPL players are robust to public opinion and their performances may be independent of recent form or pressure.

Key Insights

Overall, we have performed exploratory data analysis (EDA) on the match level data for FPL player performances in the 19/20 EPL season. Using our key findings from the EDA, we have developed a BigQuery Machine Learning model to confirm our initial interpretations. From this model, we have observed the importance of the ICT index as a predictor of whether or not a player performs exceptionally within a game. Additionally, the team of a player seems to play a significant role in whether or not a player performs exceptionally in a match particularly if this team is underperforming or in a relegation scrap. We have also found out that reputation (through FPL price/value), recent form and home advantage play a very small role in a player’s performance. This may reinforce the idea that the English Premier League is indeed the most competitive and unpredictable football league in the world; a league where anyone can perform well on a given day regardless of venue, reputation or recent form. Future exploration into the relative weights of variables that affect a player’s performance in a given match for other football leagues around the world can provide further insight into this.

References

Dataset for EPL Stats 19/20

How the FPL Bonus Points System works

Predicting whether or not a player performs exceptionally in a EPL Football Match

Recent Posts

Comments