One of the most hotly debated topics in Formula 1 is over who should receive more of the plaudits for success, the car, or the driver. There are many views that range over the entire spectrum of opinions, from those who believe that the best car would win even with the worst driver, to those who think that the best driver can single-handedly drag a middling car to greatness.
As a Formula 1 fan, I have always been very intrigued
by this question. The goal of my analysis in the paper is to provide a
quantitative look at this age-old question and set up a foundation upon which
further empirical research into this topic can be conducted. This project is adapted from a project that I worked on for a college Statistics for Data Science class that I had taken.
In this project, I looked at the last 30 years of
Formula 1 data, spanning 1994 to 2023. I looked at various statistics that I
thought would be insightful, and I eliminated metrics that were redundant and
did not offer much avenue for exploration. After I had settled on the metrics
that I thought would be most useful for analytical research, I narrowed down
the scope of the data that I was going to use. I found data for the winners of
each Drivers’ Championship from 1994 to 2023, the runner up in that season, and
their best teammate. In situations where the runner up was the teammate of the
winner of the championship for that season, I looked at the third-place
finisher and their teammate for my analysis. Because of the transient nature of
Formula 1 driving opportunities, many drivers lower down the order often get
replaced mid-season, and this could have the possibility of skewing the data.
Hence, I limited myself to top of the championship finishers, who have more
stability in team and driver selection throughout the season.
Preliminary Variable Analysis
Before I went into an in-depth analysis of the data I
had, I felt that it was important for me to first look at the individual
variables that I was using, and try to identify some trends in the data, so
that I could easily make connections between the results that I would find
through the course of analysis, and the domain specific knowledge that I
already possessed. The variables that I included in my data were:
1. Position:
This is the place that the driver secured in the final Formula 1 Drivers’
Championship in the season they were competing in. It follows a ranking scale,
with lower numbers being more prestigious, and 1 indicating that the driver won
the Drivers’ Championship in that season.
2. Team:
This metric was something that I created to make the data easier to analyze and
sort through. Given that there were always only two teams per season in my
data, I assigned 1 to the team of the winner of that season’s Championship. The
runner up’s team would be assigned 2. The second drivers for both teams were
assigned either 1 or 2 based on which team they were racing for in that season.
3. Points:
This metric is used to measure how many placing finishes a driver has over the
course of the season. Points are the metric by which the Drivers’ Championship
is analyzed, and so I made it a key part of my analysis. Due to the myriad
scoring systems in Formula 1 over the years, I also had to introduce another
variable to account for the change in system.
4. Wins:
This shows how many wins each driver had in that specific season.
5. Win
Percentage: This statistic is the percentage of total wins that driver had in
that season. It is calculated by taking the number of races won, dividing by
the total races raced that year, multiplied by 100.
6. Poles:
This shows how many pole positions each driver had in that specific season. A
pole position indicates that a driver will start at the front of the grid for
the race.
7. Pole
Percentage: This statistic is the percentage of total pole positions that
driver had in that season. It is calculated by taking the number of pole
positions, dividing by the total races raced that year, multiplied by 100.
8. Point
System: This statistic was one that I introduced to explain some of the drastic
fluctuations in points scoring. Over the past thirty years of Formula 1, there
have been a few minor changes, and one major change that caused a major impact
on points scored. This variable was introduced to correct for the sudden change
in scores over a certain period.
After finalizing the variables that I was going to
use, I plotted some graph of the variables that I was using, to identify macro
trends within the data.
The next graph that I looked at was the graph of
points totals of only the Championship winners for that season. While most of
the trends that were observed in the earlier graph were also true in this
graph, there is one very interesting pattern in this graph. Under scoring system
2, which was in effect from 2003 to 2009, the points scored by Championship
winners decreased almost every season, which stands in sharp contrast to the
rest of the data. Under every other scoring system, the number of points
required to win a championship followed a general upward trend. This makes the
contrast of the 2003-2009 seasons stand out even more, as there does not seem
to be a reason as to why the points totals continuously decreased.
The graph that I looked at next was the graph of pole positions and wins secured by the championship winner season by season. I thought that this would be an interesting area of exploration, as Formula 1 teams sometimes eschew qualifying pace to set up their car best for the race. This graph also showed something very interesting. While for the most part, the pole positions secured and the race wins of the championship winner followed roughly the same path, there were only three seasons in the data where the championship winner had the same number of pole positions and wins in a season.
This graph is similar to the graph of pole positions and wins above, except that it shows the pole position and win data in percentage form. One thing that I noticed from this graph was that there were quite a few seasons where the championship winner had neither 50 percent of total pole positions nor wins. This happened more often in the earlier seasons covered by the data, which is likely because cars were not as reliable as they are in the more recent Formula 1 seasons.
Comments
Post a Comment