Skip to main content

An Analysis of Car and Driver Impact on Formula 1 Success - Part 1


One of the most hotly debated topics in Formula 1 is over who should receive more of the plaudits for success, the car, or the driver. There are many views that range over the entire spectrum of opinions, from those who believe that the best car would win even with the worst driver, to those who think that the best driver can single-handedly drag a middling car to greatness.

As a Formula 1 fan, I have always been very intrigued by this question. The goal of my analysis in the paper is to provide a quantitative look at this age-old question and set up a foundation upon which further empirical research into this topic can be conducted. This project is adapted from a project that I worked on for a college Statistics for Data Science class that I had taken.

In this project, I looked at the last 30 years of Formula 1 data, spanning 1994 to 2023. I looked at various statistics that I thought would be insightful, and I eliminated metrics that were redundant and did not offer much avenue for exploration. After I had settled on the metrics that I thought would be most useful for analytical research, I narrowed down the scope of the data that I was going to use. I found data for the winners of each Drivers’ Championship from 1994 to 2023, the runner up in that season, and their best teammate. In situations where the runner up was the teammate of the winner of the championship for that season, I looked at the third-place finisher and their teammate for my analysis. Because of the transient nature of Formula 1 driving opportunities, many drivers lower down the order often get replaced mid-season, and this could have the possibility of skewing the data. Hence, I limited myself to top of the championship finishers, who have more stability in team and driver selection throughout the season.

Preliminary Variable Analysis

Before I went into an in-depth analysis of the data I had, I felt that it was important for me to first look at the individual variables that I was using, and try to identify some trends in the data, so that I could easily make connections between the results that I would find through the course of analysis, and the domain specific knowledge that I already possessed. The variables that I included in my data were:

1.     Position: This is the place that the driver secured in the final Formula 1 Drivers’ Championship in the season they were competing in. It follows a ranking scale, with lower numbers being more prestigious, and 1 indicating that the driver won the Drivers’ Championship in that season.

2.     Team: This metric was something that I created to make the data easier to analyze and sort through. Given that there were always only two teams per season in my data, I assigned 1 to the team of the winner of that season’s Championship. The runner up’s team would be assigned 2. The second drivers for both teams were assigned either 1 or 2 based on which team they were racing for in that season.

3.     Points: This metric is used to measure how many placing finishes a driver has over the course of the season. Points are the metric by which the Drivers’ Championship is analyzed, and so I made it a key part of my analysis. Due to the myriad scoring systems in Formula 1 over the years, I also had to introduce another variable to account for the change in system.

4.     Wins: This shows how many wins each driver had in that specific season.

5.     Win Percentage: This statistic is the percentage of total wins that driver had in that season. It is calculated by taking the number of races won, dividing by the total races raced that year, multiplied by 100.

6.     Poles: This shows how many pole positions each driver had in that specific season. A pole position indicates that a driver will start at the front of the grid for the race.

7.     Pole Percentage: This statistic is the percentage of total pole positions that driver had in that season. It is calculated by taking the number of pole positions, dividing by the total races raced that year, multiplied by 100.

8.     Point System: This statistic was one that I introduced to explain some of the drastic fluctuations in points scoring. Over the past thirty years of Formula 1, there have been a few minor changes, and one major change that caused a major impact on points scored. This variable was introduced to correct for the sudden change in scores over a certain period.

 

After finalizing the variables that I was going to use, I plotted some graph of the variables that I was using, to identify macro trends within the data.





This graph shows the points distribution for all drivers in the population set by points scored. From this we can find a few takeaways. Firstly, the graph has a rough outline of being a heavily left-skewed normal distribution. Secondly, there is a big spike in the points scored by drivers between 2009 and 2010, which is when the new points scoring rules came into effect. The points scored have been on an upward curve ever since the introduction of the new rules, where a win was increased from 10 points to 25 points, and every corresponding point value was also increased. Finally, the seasons that took place during the scoring systems that awarded 10 points for a win (Systems 1 and 2) seem to be clustered closer together than the season that had 25 points for a win (Systems 3 and 4), indicating a higher level of competition in scoring systems 1 and 2.

 


The next graph that I looked at was the graph of points totals of only the Championship winners for that season. While most of the trends that were observed in the earlier graph were also true in this graph, there is one very interesting pattern in this graph. Under scoring system 2, which was in effect from 2003 to 2009, the points scored by Championship winners decreased almost every season, which stands in sharp contrast to the rest of the data. Under every other scoring system, the number of points required to win a championship followed a general upward trend. This makes the contrast of the 2003-2009 seasons stand out even more, as there does not seem to be a reason as to why the points totals continuously decreased.


 

The graph that I looked at next was the graph of pole positions and wins secured by the championship winner season by season. I thought that this would be an interesting area of exploration, as Formula 1 teams sometimes eschew qualifying pace to set up their car best for the race. This graph also showed something very interesting. While for the most part, the pole positions secured and the race wins of the championship winner followed roughly the same path, there were only three seasons in the data where the championship winner had the same number of pole positions and wins in a season.


                                        


This graph is similar to the graph of pole positions and wins above, except that it shows the pole position and win data in percentage form. One thing that I noticed from this graph was that there were quite a few seasons where the championship winner had neither 50 percent of total pole positions nor wins. This happened more often in the earlier seasons covered by the data, which is likely because cars were not as reliable as they are in the more recent Formula 1 seasons.


                                     


The last variable analysis that I looked at was the year over year positioning of the winner’s and the runner up’s teammates in the Championship. For the most part, it seemed that the winner’s teammate usually outperformed the runner up’s teammate. For both drivers’ highest possible positions, the winner’s teammate finished second in the championship 9 times, as compared to the runner up’s teammate finished third in the championship 6 times.

In part 2 of this project, I will establish 3 research questions, and through various statistical analysis, I will establish their veracity.

Comments

Popular posts from this blog

Gone Too Soon: The Story Of Dražen Petrović

In the 1989-90 season, the Portland Trail Blazers bought out Dražen Petrović’s contract with Real Madrid and convinced him to join the NBA. This would mark the start of a trailblazing career that was tragically cut short.        Dražen Petrović was born in Šibenik, Croatia on the 22 nd of October, 1964. At the age of 15, he was already in the first team of his hometown club, and by the age of 18, Petrović had blossomed into a star for Šibenik. After serving in the military for a year, he moved to Cibona in 1984, where he would play till 1988. At Cibona, Petrović shined. He once scored 112 points in a Yugoslavian League game ( 40/60 FG, 10/20 3Pts, 22/22 FT), which is possibly the most efficient performance in any European league ever. He averaged 37.7 points in the Yugoslavian first division and 33.8 points in European competitions in his 4 years at Cibona, cementing his status as a European star. In 1988, at the age of 23, he moved to Real Madrid, where he stayed ...

Statistics to Help the Spurs

Every sports fan, diehard or casual, has watched Moneyball, the movie about the use of statistics in baseball. While sports has become more receptive to the use of statistics to identify players, many fans still do not like to use or misuse statistics to back up their opinions. As an avid NBA fan, I too love to concoct fictitious trades to help make my team better. Through the use of statistics, I am going to try to make well informed decisions regarding player acquisitions for the San Antonio Spurs, my favourite NBA team. To tackle this problem, I used a linear regression model. To create the model, I first collected box score data for the Spurs’ 2019-20 season. This data was then used to create a model that will give a composite score, which predicts a team’s record. According to the model, a score closer to 1 indicates a better record, while a score closer to 0 indicates a worse record. Using Basketball Reference, I identified 8 players who the Spurs could feasibly acquire and who...

An Analysis of Car and Driver Impact on Formula 1 Success - Part 2

  In the previous part of this project, I looked at the variables I was using, and some of the trends that I identified through a preliminary analysis. Part 2 of this project is dedicated to:  - The research questions I formulated  - The statistical analyses that I used for each question  - The interpretation of my analysis  - What conclusions I was able to draw to answer each research question Research Questions Based on my preliminary analysis of the variables that I was working with, I came up with more questions that I was interested in exploring, in addition to my original goal of figuring out whether the car or driver was more crucial to Formula 1 success. One of the first things that piqued my interest was how the different points systems affected overall scoring. While it was immediately clear that the change in point scoring systems from 10 points for a win to 25 points for a win resulted in drastic changes to the point totals, my hypothesis was ...