Abstract

Flight delays have been a common problem within aviation industry and air travel. It can cause negative impact on both aviation businesses and passengers. Therefore, flight delays analysis and prediction are applied to data-driven decision making in aviation related businesses and passengers. In this study, a dataset consisting of 119,631 flights operated by major airlines in U.S. in 2023 is used to determine the main factors influence flight departure delay based on interactive visualizations on Tableau. A smaller subset of the data, including flights departed from Texas within summer, is applied by random forest machine learning algorithms to create the classification tree-based model in RStudio. The result shows that destination state, taxi-out time, airtime, day of week, departure schedule time, and distance between origin and destination airport are important factors to determine departure delay flight. The random forest model, where the predicted probabilities are weighted, is tested on the out-of-bag sample and separated test set. The findings demonstrate that this approach results in overall prediction accuracy of 65%. To Dr. Manz, I saw your academic alert notice and I want to discuss that with you in person. Are you available after 1pm today? From, Dd Dawrat

Advisor

Kelvey, Robert

Department

Statistical and Data Sciences

Disciplines

Data Science | Probability | Statistical Models | Statistical Theory | Theory and Algorithms | Transportation and Mobility Management

Keywords

Flight delays, Aviation industry, Random forest, Data-driven decision-making, Interactive dashboards

Publication Date

2024

Degree Granted

Bachelor of Arts

Document Type

Senior Independent Study Thesis

Share

COinS
 

© Copyright 2024 Dharmarak Dawrat