Abstract
Sabermetrics are the application of statistical analysis to baseball records, especially in order to evaluate and compare the performance of individual players. In recent years, sabermetrics have begun to increase in popularity because of their advanced equations and simple player comparisons. Our study focuses on using only sabermetrics to predict runs scored and runs allowed. Batting sabermetrics will be used to predict runs scored and pitching sabermetrics for runs allowed. The historical correlation between wins and runs scored and losses and runs allowed will be found in order to determine a win/loss record for each set of runs scored and allowed predictions. Three statistical methods will be used for model building: Multiple Linear Regression, Random Forests, and K-Means Clustering. Using year-over-year correlation for each of the sabermetrics and the statistically significant variables from the models, we will determine the best free agent signings for the Atlanta Braves while keeping payroll and commitment in mind. Each model and statistical method was run in R, with two final models for each method (one for runs scored, one for runs allowed). Each model gave us runs scored and allowed prediction for each season. The predictions were multiplied by a historical correlation coefficient for wins and losses. This was found by dividing wins by runs scored and losses by runs allowed using data from 1960-2021. Multiplying our runs scored and allowed predictions by the historical correlation coefficients gave us the predicted win and loss total for that season. Some predictions did not match the total regular-season games played, so they were proportionally scaled to match. The linear regression models were found to be the most accurate in terms of average games off the actual for each season, but over the ten-year span, the random forest model was more accurate in the total record. With the models completed and season correlations found, we began the roster-building section. Linear regression was run to predict each player's value in terms of AAV, average annual value. With each player receiving a monetary value, we could create a roster to fit positional needs and payroll restraints. After many comparison plots, we determined the seven free agents that the Braves should sign and compared them to the actual additions. Two of our predicted signings were signed by the Braves, with two others that our comparisons had in the top group as well. Our grand total salaries for the additions were $2 million above the actual money spent, meaning our budgeting process was accurate.
Advisor
Pasteur, Drew
Department
Statistical and Data Sciences
Recommended Citation
Springer, Colin, "Major League Baseball Roster Building: An Empirical Analysis Of Pre-Season Win Predictions" (2022). Senior Independent Study Theses. Paper 9756.
https://openworks.wooster.edu/independentstudy/9756
Disciplines
Data Science | Statistical Models
Publication Date
2022
Degree Granted
Bachelor of Arts
Document Type
Senior Independent Study Thesis
© Copyright 2022 Colin Springer