Evaluation of Statistical Methods for Classification of Laser-Induced Breakdown Spectroscopy (LIBS) Data



Journal Title

Journal ISSN

Volume Title



When NASA’s Curiosity rover lands in August 2012, the rover will use a laser-induced breakdown spectroscopy (LIBS) instrument to collect data in an effort to understand the chemical composition and geological classification of the rocks on Mars. This is part of a larger endeavor to determine information about the planet’s habitability. LIBS is a method used to determine the elemental composition of a given sample. For each rock sample analyzed by the instrument, a LIBS spectrum consisting of over 6,000 different channels is obtained. In order to prepare for the return of LIBS data from the rover, this project aims to evaluate the accuracy of statistical methods, such as discriminant analysis, support vector machines, and clustering algorithms for categorizing the rock samples into groups with similar chemical compositions based on their LIBS spectra alone. Accurate classification is critical for rapid identification of similar unknown samples, novelty detection, and in the selection of a training set of data for use in the estimation of chemical compositions. Similar studies have been performed; however, they generally fail to use statistical best practices and therefore have wildly optimistic results. The data used in this project is from the “century set”, a suite of 100 igneous rock samples. These 100 samples are the only ones currently available for this project which have both LIBS spectra and known chemical compositions. Having the known chemical compositions allowed the century set samples to be divided into groups with geological similarities based on their Total Alkali-Silica (TAS) classes, and provided a way to evaluate the predictive accuracy of the classification algorithms using K-fold cross validation. The results show that the small sample size and uneven distribution of samples in different TAS classes make classification into many groups difficult, contradicting many of the outcomes displayed in the literature. However, some of the methods explored in this thesis do show promise based on their performance in simpler classification tasks, so the results should be reevaluated once more data is obtained. LIBS data is scarce, so this thesis also briefly explores the results from one method of simulating a LIBS spectrum based on the sample’s chemical composition. Simulated data could be used to examine the effects of sample size on the accuracies of the various classification algorithms.



statistical classification, clustering, spectroscopy, LIBS, machine learning, discriminant analysis, support vector machine, statistics