Defense of a Master's Thesis- Eric Auel
genben: A Framework for Benchmarking Genomic Data Analysis Methods on Scalable Systems
With an ever-increasing number of human DNA sequencing efforts being conducted, the amount of genetic variation data available for research has grown substantially over the past few decades. This data provides scientists with the ability to study various traits of humans and other species. Several data analysis methods can be applied to this genetic variation data, such as allele counting and principal component analysis (PCA). Software libraries like scikit-allel can be used to easily explore these data sets, as it contains many functions that can be directly used on genetic variation data. However, trade-offs often exist when working with unique data sets and when performing analysis on various hardware environments. Additionally, many parameters can be tweaked when storing this genetic variation data, such as compression ratios, compression algorithms, and block sizes. Having the ability to quantify the performance impact of tweaking these parameters can be extremely useful for software developers, data scientists, and researchers. Algorithms that can be used on this data could also be improved in the future, so being able to compare system resource usage before and after these modifications could be extremely insightful in terms of quantifying overall improvements of new algorithms. This thesis presents genben, a flexible framework that can be used to benchmark various functionality involved with analyzing genetic variation data, and it additionally provides several benchmark experiments that demonstrate the ability to test different algorithm implementations, different configuration parameters, and different hardware configurations utilizing high-performance computing systems.
Thursday, July 25 at 3:00pm
Min H. Kao Electrical Engineering and Computer Science, 434
1520 Middle Drive, Knoxville, TN 37996