Wrapping Your R tools to Analyze National-Scale Cancer Genomics in the Cloud
The Cancer Genomics Cloud (CGC), built by Seven Bridges and funded by the National Cancer Institute hosts The Cancer Genome Atlas (TCGA), that is one of the world's largest cancer genomics data collections. Computational resources and optimized, portable bioinformatics tools are provided to analyze the cancer data at any scale immediately, collaboratively, and reproducibly. Seven Bridges platform is not only available on AWS but also available on google cloud as well. With Docker and Common Workflow Language open standard, wrapping a tool in any programming language into the cloud and compute on petabyte of data has never been so easy. Open source R/Bioconductor package 'sevenbridges' is developed to provide full API support to Seven Bridges Platforms including CGC, supporting flexible operations on project, task, file, billing, apps etc, users could easily develop fully automatic workflow within R to do an end-to-end data analysis in the cloud, from raw data to report. What's most important, 'sevenbridges' packages also provides interface to describe your tools in R and make it portable to CWL format in JSON and YAML, that you can share easily with collaborators, execute it in different environment locally or in the cloud, everything is fully reproducible. Combined with the R API client functionality, users will be able to create a CWL tool in R and execute it in the cancer genomics cloud to analyze the huge amount of cancer data at scale.