A Case Study in Reproducible Model Building: Simulating Groundwater Flow in the Wood River Valley Aquifer System, Idaho
The goal of reproducible model building is to tie processing instructions to data analysis so that the model can be recreated, better understood, and easily modified to incorporate new field measurements and (or) explore alternative system and boundary conceptualizations. Reproducibility requires archiving and documenting all raw data and source code used to pre- and post-process the model; an undertaking made easier by the advances in open source software, open file formats, and cloud computing. Using a software development methodology, a highly reproducible model of groundwater flow in the Wood River Valley (WRV) aquifer system was built. The collection of raw data, source code, and processing instructions used to build and analyze the model was placed in an R package. An R package allows for easy, transparent, and cross-platform distribution of its content by enforcing a set of formal format standards. R largely facilitates reproducible research with the package vignette, a document that combines content and data analysis source code. The code is run when the vignette is built, and all data analysis output (such as figures and tables) is created extemporaneously and inserted into the final document. The R package created for the WRV groundwater-flow model includes multiple vignettes that explain and run all processing steps; the exception to this being the parameter estimation process, which was not made programmatically reproducible. MODFLOW-USG, the numerical groundwater model used in this case study, is executed from a vignette, and model output is returned for exploratory analyses.