Nuclear magnetic resonance (NMR) is used in organic chemistry to identify unknown organic compounds. The data obtained from an NMR spectrometer are typically shown in the form of a spectrum, which is then analyzed by an analytical chemist. The action of analyzing a spectrum, especially one of a large and complex molecule, is a long and tedious process. In this project, Python is used to implement hierarchical clustering on NMR data obtained from an NMR spectrometer at the College of Wooster to explore its application in NMR analysis. MATLAB is used to build a decision tree from the same data, whose accuracy is compared to that of the hierarchical clustering. The decision tree is also examined to gain information about how to better automate the analysis process. These data clustering and classification processes are used to identify major functional groups within the compound from the spectral data, once feature extraction has been performed. Once these functional groups are identified, the compounds are clustered via hierarchical clustering, or classified with a decision tree. This processes provides insight into how to identify unknown organic molecules in a faster and more accurate manner, a much needed improvement in organic chemistry experimental research. It was found that decision trees are a much more accurate machine learning method to classify the organic compounds, when doing so based on present functional groups.


Visa, Sofia


Computer Science


Analytical Chemistry | Artificial Intelligence and Robotics | Computer Sciences | Organic Chemistry


Machine learning, organic chemistry, NMR, nuclear magnetic resonance, hierarchical clustering, decision trees

Publication Date


Degree Granted

Bachelor of Arts

Document Type

Senior Independent Study Thesis Exemplar



© Copyright 2021 Nicole Maia Powell