Abstract

Sabermetrics are the application of statistical analysis to baseball records, especially in order to evaluate and compare the performance of individual players. In recent years, sabermetrics have begun to increase in popularity because of their advanced equations and simple player comparisons. Our study focuses on using only sabermetrics to predict runs scored and runs allowed. Batting sabermetrics will be used to predict runs scored and pitching sabermetrics for runs allowed. The historical correlation between wins and runs scored and losses and runs allowed will be found in order to determine a win/loss record for each set of runs scored and allowed predictions. Three statistical methods will be used for model building: Multiple Linear Regression, Random Forests, and K-Means Clustering. Using year-over-year correlation for each of the sabermetrics and the statistically significant variables from the models, we will determine the best free agent signings for the Atlanta Braves while keeping payroll and commitment in mind. Each model and statistical method was run in R, with two final models for each method (one for runs scored, one for runs allowed). Each model gave us runs scored and allowed prediction for each season. The predictions were multiplied by a historical correlation coefficient for wins and losses. This was found by dividing wins by runs scored and losses by runs allowed using data from 1960-2021. Multiplying our runs scored and allowed predictions by the historical correlation coefficients gave us the predicted win and loss total for that season. Some predictions did not match the total regular-season games played, so they were proportionally scaled to match. The linear regression models were found to be the most accurate in terms of average games off the actual for each season, but over the ten-year span, the random forest model was more accurate in the total record. With the models completed and season correlations found, we began the roster-building section. Linear regression was run to predict each player's value in terms of AAV, average annual value. With each player receiving a monetary value, we could create a roster to fit positional needs and payroll restraints. After many comparison plots, we determined the seven free agents that the Braves should sign and compared them to the actual additions. Two of our predicted signings were signed by the Braves, with two others that our comparisons had in the top group as well. Our grand total salaries for the additions were $2 million above the actual money spent, meaning our budgeting process was accurate.

Advisor

Pasteur, Drew

Department

Statistical and Data Sciences

Disciplines

Data Science | Statistical Models

Publication Date

2022

Degree Granted

Bachelor of Arts

Document Type

Senior Independent Study Thesis

Share

COinS
 

© Copyright 2022 Colin Springer