Size of Datasets for Analytics and Implications for R

Play Size of Datasets for Analytics and Implications for R
Sign in to queue

Description

With so much hype about "big data" and the industry pushing for distributed computing vs traditional single-machine tools, one wonders about the future of R. In this talk I will argue that most data analysts/data scientists don't actually work with big data the majority of the time, therefore using immature "big data" tools is in fact counter productive. I will show that contrary to widely-spread believes, the increase of dataset sizes used for analytics has been actually outpaced in the last 10 years by the increase in memory (RAM), making the use of single-machine tools ever more attractive. Furthermore, base R and several widely used R packages have undergone significant performance improvements (I will present benchmarks to quantify this), making R the ideal tool for data analysis on even relatively large datasets. In particular, R has access (via CRAN packages) to excellent high-performance machine learning libraries (benchmarks will be presented), while high-performance and parallel computing facilities have been part of the R ecosystem for many years. Nevertheless, the R community shall of course continue pushing the boundaries and extend R with new and ever more performant features.

Day:

1

Embed

Download

Right click to download this episode

The Discussion

  • User profile image
    Szilard

    Slides here: https://speakerdeck.com/szilard/size-of-datasets-for-analytics-and-implications-for-r-user-conference-stanford-university-june-2016

Add Your 2 Cents