Is the Brazilian Championship getting more unfair?

An exploratory analysis using the Gini Coefficient

All data and the complete analysis can be found in the repository brasileirao-gini. There you will find all the instructions to execute the analysis step by step.

Introduction

One of the good things about the digital age is that with the use of WhatsApp, we can have our virtual round tables with our friends no matter how far away we are from them. And within one of these virtual chats, we were discussing the great work of the Flamengo coach in the current season (i.e., aspects of how he positively changed the team).

However, a relevant topic appeared during the debate, which was the hypothesis that as more money enters the championship and the larger teams, the Brasileirão is getting increasingly unfair, with a set of teams winning many games and the small teams being relegated to mere punching bags within the league.

In other words, this means that the championship is always contested by the same clubs with economic power, and because of this disparity, competitiveness could be decreasing over time.

This raised the question within our group: Is the Brasileirão getting more unfair over time?

And it is this question that I will try to answer at the end of this post.

How to check if there is structural inequality in the Brazilian Championship?

To initially answer this question, I will use the Gini Coefficient, which is a metric to measure statistical dispersion that was initially created to measure the distribution of income and wealth among countries and is widely used in economics as an important indicator for monitoring these issues.

The Gini Coefficient is used as a relative measure for benchmarking and monitoring inequality and poverty across various countries and serves as a basis for analysis and development of public policies.

Since this post is not intended to speak in depth about this indicator, I suggest reading the original work by Corrado Gini called Variabilità e Mutabilità or research on how this indicator has been used as a parameter for social welfare analyses in Brazil.

That is, I will simply measure the variance of the distribution of points within each of the editions of the Brazilian Championship (which I will refer to from now on as Brasileirão) to check if there is a latent structural inequality within the championships over the years.

Some preliminary considerations on limitations of measuring this inequality

As mentioned earlier, we can simply measure important factors like economic inequality through income and/or wealth using the Gini Coefficient. Here I will use a simple allegory of how I will apply this to the Brasileirão data.

If this index serves in some way to measure inequality among countries considering their wealth or even their income, bringing it to the world of football, we can use the points earned by a team within a season as a representation of income and apply the regular Gini Coefficient not only within a specific year of the league but also to monitor this inequality over time.

For this, I will use the database of all Brasileirão results from 2003 to 2018 extracted from Wikipedia. Here I must make two considerations:

These are some important limitations that must be considered, given that the number of points at stake has changed and an adjustment is necessary.

It is worth noting that the Gini Coefficient and the analysis itself have several limitations that must be understood, such as:

  • Not taking the “Tradition Factor” into account regarding the productivity of clubs over time, i.e., a large and old club carries a larger financial/economic/institutional structure than newer clubs, and this can be seen in the distribution of champions in the round-robin era (In economics, this would be similar to the productivity transition effect of an installed production base over the years);
  • Economic differences in the regions where the clubs are based;
  • The Gini Coefficient looks only at the final result without taking into account cyclical factors that can influence these results, such as management, the team’s economic moment, and other events like the Olympics and World Cup;
  • The concept of the nature of point generation (which in economics would be income) can have very different dynamics, given that the index does not capture whether the points were generated through 3 draws (1 point multiplied by 3) or by one victory and two defeats (3 points);
  • Analyzing the generation of points itself over time can be complicated and would not represent a plausible comparison. For example: The 2009 Flamengo champion would, at best, be a mere fifth-place finisher in 2014 based on total points;
  • The index itself, by dealing only with the final result, does not show the transitivity of these points throughout the championship. Let me explain: If a team in the final rounds of the championship no longer has any chance of qualifying for a major competition (Copa Libertadores), of being relegated to lower divisions, or has already won the title, it may happen that these teams enter with less disposition to win the games. Financial incentives between teams (known in Brazil as “mala branca”) can also occur. This Investopedia article talks a bit about this effect;

Some other limitations of the Gini Coefficient can be found in the work of Tsai, in the Working Paper by Osberg, or on the HSRC website.

To calculate the Gini Coefficient, I used Olivia Guest’s code simply for simplicity, but any software can be used once the data is available in the repository.

Since the code has a lot of boilerplate code from things I’ve done in the past and my focus is the analysis itself, I’m not going to comment on the entire code.

Let’s take an initial look just to see if the data was loaded correctly.

Everything seems OK with the data, so I will perform the calculation of the Gini Coefficient using all editions of the Brasileirão.

Inequality ranking among all editions of the Brasileirão using the Gini Coefficient

Using the data that was loaded and the functions for Gini calculation, I obtained the following table:

Gini Coefficient Calculated with the respective champions of each edition

Considering the Gini Coefficient as the main ordering metric, we can see that the Brasileirão editions of 2017, 2005 (both with Corinthians as champion), and 2009 (Flamengo as champion) were those with the most equality within the round-robin era.

On the other hand, the editions of 2018 (Palmeiras), 2014 (Cruzeiro), and 2012 (Fluminense) were the most unequal regarding the final distribution of points. A curious fact is that if we take the 5 most unequal seasons, we will see Palmeiras and Fluminense with 2 titles each (respectively 2018, 2016 and 2012, 2010).

This information points out that Corinthians and Flamengo tend to win seasons with a more balanced distribution of final points, and when Palmeiras and Fluminense win, they are generally more unequal seasons from the perspective of point distribution at the end. This would be a good initial hypothesis to be tested with more data.

Let’s look at the two extremes, which are the 2017 (most equal) and 2018 (most unequal) seasons.

Brasileirão 2017

Brasileirão 2018

Looking at the distribution of points here, we can see that the point difference between the champions was 8 points (80-72). Considering the distance from the champion to the fifth-placed team in both championships: in 2017 we have a distance of 15 points (72-57), while in 2018 we have 17 points (80-63), which is very similar.

However, knowing that the champion always has a bit of margin considering only these extremes, if we consider only the distance between the runner-up and the fifth-placed team, in 2017 we have only 6 points (63-57), while in 2018 this distance goes to 9 points (72-63).

Performing the same exercise between the champion and the worst team in the championship, in 2017 we have a difference of 36 points (72-36), while in 2018 we reach the impressive mark of 57 points difference (80-23). This shows that even among the worst teams throughout these two championships, we have a significant difference of 13 points (36 (Atlético Goianiense/2017) vs 23 (Paraná/2018)). Let’s stay tuned to this information, as I will come back here later.

Now I will generate a graph of how this Gini Coefficient has behaved over time.

Gini Coefficient in the Brasileirão over time

In a first analysis, we can notice some curiosities:

  • It seems we indeed have a trend (not so clear) of growing inequality, but with very distinct valleys and peaks. Considering that the 20-team format has only 12 complete seasons, this must be taken with caution;
  • Something surprising is that the valleys usually happen in odd years and the peaks in even years. This might be explained by some external effect, such as the Olympics and World Cup that occur in even years. It is a very weak hypothesis, but it is still a hypothesis;
  • It seems that after 2011 there was a more consistent rise and maintenance of this increase in inequality, with the Gini Coefficient returning below 0.115 only in 2017, meaning a window of 6 years above the 0.115 level;
  • And speaking of 2017, this season seems to have been a total break in relation to this inequality, given that the second, third, and fourth-place finishers finished with a difference of only 1 point, 3 teams finished with 43 points, with two of them being relegated to the second division.

However, one thing I considered was that maybe there is some effect I call “Last season in Serie A” or “I’m already relegated anyway, there’s nothing else to do” regarding this (in)equality, given that 4 teams always leave/enter per year (i.e., 20% of the teams are replaced every year). To remove this potential effect, I will consider a moving average based on the last 3 championships. That is, there will always be a combination of a) two unequal years and one equal year and b) two equal years with one unequal year.

Gini Coefficient when considering a 3-edition window to compute the average

Performing this rough smoothing, we can see the effect of this increase in inequality more clearly, almost linearly from 2009 to 2016, being broken only by the infamous 2017 season, and with the 2018 season suffering a clear effect from this adjustment.

Previously, I spoke about this disparity in points between the top finishers, between the last finishers, and the champions in relation to the worst teams in the league. We can notice that in these extreme cases, we always have the champion winning with a small cushion, and with subsequent teams having some kind of dispute in different orders of magnitude.

To validate this point that this inequality is increasing in a slightly more robust way, I will remove our outliers from these championships, or in short: I will remove the champion and the worst team of the season from the analysis.

(Author’s Note: I know there are numerous approaches on how to do this correctly. For the sake of simplicity and to reinforce my point, I am removing them given that within these extreme situations, I have as results a) the champion with a slight advantage at the end and b) the worst team of the season being really much worse, the idea is to see the equality of the other teams in the championship. Remember that I want to see in general the (in)equality of the league and remove the effect of the super-champions and the shameful punching bags of the season.)

That said, I will remove these outliers and recalculate the Gini Coefficient.

Gini Index with the champion and the last-place finisher of the championship removed

Now we have some significant changes in our panel, which are:

  • If before we had the 2017, 2005, and 2009 seasons as the most equal, now we remove the 2009 season won by Flamengo and include 2007 won by São Paulo;
  • On the side of the most unequal seasons, previously we had the order of inequality as the 2018, 2014, and 2012 seasons. Now we have a new order regarding the 2014 (Cruzeiro), 2012 (Fluminense), and 2018 (Palmeiras) editions.

Let’s check how the table of the most extreme editions looked without the champions in 2007 and 2014.

Brasileirão 2007

Brasileirão 2014

Looking at the two final tables in which we removed the champion and the worst team, we can immediately notice that while in the 2014 season we have two gaps of 7 points (4th and 5th, and 7th and 8th), in the 2007 championship the largest difference was 4 points (15th and 16th). Doing a small exercise of imagination, we could say that Paraná, which had 41 points in 2007, would be in no danger of going to the second division if it had the same number of points in the 2014 championship. As I stated earlier, this analysis of transposing a team over time is not very valid; however, I mentioned this only to emphasize the importance of the distribution of points rather than the absolute number of points in the final result.

Let’s generate the graph just to check if the trend of increasing inequality remains or not.

Gini Index over time, removing the champion and the worst team of the season

Gini Index over time, removing the champion and the worst team of the season, considering the average of the last 3 editions

Looking at the Gini Coefficient while removing the champion and the worst team, we can see that we still have the trend of increasing inequality within the league, whether using the metric regularly or applying a moving average with a 3-season window over time.

Facts

Throughout the data presented in this analysis, I have reached the following facts:

  • We have a growing trend in inequality regarding the number of points among teams within the Brazilian championship;
  • This growth starts more substantially in 2010;
  • The most recent seasons (2018 and 2017) are respectively the seasons with the highest and lowest inequality;
  • This trend of increasing inequality occurs even when removing the champion and the worst team of the season, given that the champion in extreme cases has a small advantage over the second-placed team, and the worst team ends with a score virtually impossible to reverse on a last-round occasion;
  • In seasons where there is greater inequality, there are gaps of points between blocks of teams. In the case we saw, these gaps were 7 points (2 wins + 1 draw);
  • In more equal seasons, these gaps between blocks of teams are probably smaller or in some cases nonexistent. In the case observed, there was only one block of 4 points (1 win + 1 draw) and excluding this fact, there was no distance greater than 1 win throughout the table between immediately superior positions.

Conclusion and considerations for the future

If we have to answer our main question, which was “Is the Brasileirão getting more unfair over time?” the correct answer would be:

“Yes. Using the Gini Coefficient as a metric to measure whether there is structural inequality has shown that there are indeed latent elements of this inequality.”

The more attentive reader may notice that one thing I took great care with here was not to make statements regarding competitiveness, statements regarding the financial conditions of the clubs, increases in prize money, or absence of incentives for the lower-placed teams, etc., given that these aspects are difficult to measure and there is very little reliable data available for analysis. But empirically, perhaps we can make some statements in these directions.

Regarding future analyses, there are many hypotheses that can be tested, such as a potential fundamental problem of competitiveness due to the fact that many clubs do not represent an elite (i.e., many weak teams in the main league). There are also topics regarding increasing the number of relegated clubs. Hypotheses that list important factors like financial disparity as one of the potential factors for the existence of few super-teams. There is a hypothesis that also gains traction regarding the financial quotas for television broadcasting rights, which make up a large part of these clubs’ revenue.

There are many hypotheses under discussion and many aspects that could be the reason for the increase in this inequality. A large number of aspects can be studied, and at the end of this post, there are some links for references to other leagues.

All data and the complete analysis can be found in the repository brasileirao-gini. There you will find all the instructions to execute the analysis step by step.

Inequality in the Premier League - Çınar Baymul

An Analysis Of Parity Levels In Soccer - Harvard Sports

Which Sports League has the Most Parity? - Harvard Sports

Major League Soccer and the Effect of Egalitarianism - Harvard Sports

The Gini Coefficient as a Measure of League Competitiveness and Title Uncertainty - Australia Sports Betting

Mourão, P. R., & Teixeira, J. S. (2015). Gini playing soccer. Applied Economics, 47(49), 5229-5246

How “fair” are European soccer leagues? Gini index applied to points distribution of 5 soccer leagues between 2000 and 2015 - r/soccer

Footballomics: Estimating League Disparity Performance with a Point-Rank Gini Index - Christoforos Nikolaou