Reusable R for automation, small area estimation and legacy systems
Running a complex model once is easy, just pull up your statistical program of choice, plug in the data, the model and off you go. The problem comes when you then find yourself trying to scale to running that model with different data hundreds or thousands of times. In order to scale and save analysts from spending all their time running models over and over again you need automation. You need a well-designed and tested environment. You need well-engineered R. You also need to sell it to analysts. We wanted to use the tools of software engineering and reusable research to allow statisticians and epidemiologists to be more efficient, but statisticians and epidemiologists are not computer scientists and a lot of this world is new to them. So we had to develop not only for good software practice but to ensure that others could use our tools, even when it comes with a very different focus to what they might be used to.
Using the example of batch small area estimation using generalized additive models, we will talk about the project, the tools we used and how to integrate R into a legacy SAS environment with a minimum of pain, allowing for uptake of the strengths of R without exposing new users to its complexity.