Using Data Provenance to Support Reproducibility in R

dc.contributorMarkley, Michelle
dc.contributorMcCauley, James
dc.contributor.advisorLerner, Barbara
dc.contributor.authorFabrega, Sean
dc.date.accessioned2023-06-09T13:28:03Z
dc.date.available2023-06-09T13:28:03Z
dc.date.gradyear2023en_US
dc.date.issued2023-06-09
dc.description.abstractThe use of computers for data processing and analysis has dramatically transformed the approaches and capabilities of scientific research. Today, researchers are able to process and draw conclusions from large volumes of data in relatively little time, expanding the breadth and efficiency of their work. Despite this shift, verifying results through multiple studies and experiments will always remain important. A 2019 National Academies report recommended more research and development to ensure published scientific results are computationally reproducible, meaning the same results can be derived from the original data and analysis methods. Often, computational reproducibility requires information about the computing environment – such as the operating system, language, and package versions where the results were produced – as well as the data and script. This is because software can behave differently when components of the computing environment change. Therefore, an approach to reproducible research involves collecting all of the information about the scripts, data, and computing environment, also known as data provenance. In the R language, the rdtLite package facilitates the collection of data provenance for a given script execution. This thesis will focus on developing methods that use data provenance as a blueprint for reconstructing a computing environment and conducting experiments that apply this tool to identify situations in which changes to the environment resulted in changes in script behavior.en_US
dc.description.sponsorshipComputer Scienceen_US
dc.identifier.urihttp://hdl.handle.net/10166/6427
dc.language.isoen_USen_US
dc.rights.restrictedpublicen_US
dc.subjectdata provenanceen_US
dc.subjectdata scienceen_US
dc.subjectR programming languageen_US
dc.subjectreproducibilityen_US
dc.titleUsing Data Provenance to Support Reproducibility in Ren_US
dc.typeThesis
mhc.degreeUndergraduateen_US
mhc.institutionMount Holyoke College

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Draft_Archive_Version.pdf
Size:
4.4 MB
Format:
Adobe Portable Document Format
Description:
Main article