Download this episode
Despite the hype around "big data", a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and heterogeneous fragments -- the outputs of many and isolated scientific studies conducted around the globe. Together with 92 other co-authors, we recently published the Biomass And Allometry Database (BAAD) as a data paper in the journal Ecology, combining data from 176 different scientific studies into a single unified database. BAAD is unique in that the workflow -- from raw fragments to homogenised database -- is entirely open and reproducible. In this talk I introduce BAAD and illustrate solutions (using R) for some of the challenges of working with and distributing lots and lots of #otherpeople's data.
Available formats for this video:
Actual format may change based on video formats available and browser capability.
Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.