This study seeks to analyze waveforms of gendered, emotional human speech to extract acoustic features from the RAVDESS emotional speech dataset. After a phase of preprocessing using min-max normalization, qualitative and quantitative signal analysis is performed on the samples in which maximum amplitude, average amplitude, and summed amplitude features are extracted. We find that the summed amplitude feature is the most informationally-rich and compelling of the features extracted, and move forward with its use in the classification phase. We deploy two clustering algorithms to perform classification on the speech samples: k-means and agglomerative clustering. The results of the clustering show similarities between some gendered emotion samples, but fail to cluster along gender or emotion type. Model prototypes are then created through the inclusion of more samples, and through further qualitative analysis performed on a variant of the summed amplitude feature. These reveal similarities in the volume shifts within each emotion variant, and a marked increase in volume for happy male speech and sad female speech specifically. Finally, we propose a framework for an automated speech classification algorithm.
Pfeffer, David McCulloch, "Sending Mixed Signals: Feature Extraction for Gendered Emotional Speech Classification and Modeling" (2020). Senior Independent Study Theses. Paper 9073.
signal analysis, machine learning, classification, clustering, k-means, emotional speech, gendered speech, feature extraction, normalization
Bachelor of Arts
Senior Independent Study Thesis
© Copyright 2020 David McCulloch Pfeffer