Abstract

This project modifies existing machine translation models for the purpose of translating code-mixed English and Spanish language (Spanglish) into monolingual Spanish. Firstly, this project summarizes the cultural and linguistic factors that define Spanglish as a form of oral and written communication distinctive from its English and Spanish constituents, referencing particularly the "radically bilingual" adjectival structures of the memoir Killer Crónicas by Susana Chávez-Silverman. Secondly, this work provides background information of machine learning concepts as a foundation to then analyze the Helsinki and mBART50 machine translation models, highlighting the architectural and training paradigms that enable them to perform machine translation. Thirdly, this project outlines the strategies and GPT concepts implemented to generate a synthetic parallel corpus used to fine-tune Helsinki and mBART50 for the purpose of translating Spanglish adjectival phrases from the selected literature. Finally, this work analyzes the results of the fine-tuned models, contextualizing their performance from both a linguistic and machine intelligence standpoint.

Advisor

Nord, Alex

Second Advisor

Balam, Osmer

Department

Computer Science; Spanish

Disciplines

Artificial Intelligence and Robotics | Computer Sciences | Spanish and Portuguese Language and Literature | Spanish Linguistics

Keywords

Machine translation, Spanglish, machine learning, Helsinki, mBART50, synthetic parallel corpus, fine-tuning, adjectival phrases, linguistics, code-mixing

Publication Date

2025

Degree Granted

Bachelor of Arts

Document Type

Senior Independent Study Thesis

Share

COinS
 

© Copyright 2025 Walker A. Johnson