Efficient tabular data ingestion and manipulation with MonetDBLite

Play Efficient tabular data ingestion and manipulation with MonetDBLite
Sign in to queue

Description

We present "MonetDBLite", a new R package containing an embedded version of MonetDB. MonetDB is a free and open source relational database focused on analytical applications. MonetDBLite provides fast complex query answers and unprecedented speeds for data availability and data transfer to and from R. MonetDBLite greatly simplifies database installation, setup and maintenance. It is installed like any R package, and the database fully runs inside the R process. This has the crucial advantage of data transfers between the database and R being very fast. Another advantage is MonetDBLite's fast startup with existing data sets. MonetDBLite will store tables as files on disk, and can reload from these regardless of their size. This enables R scripts to very quickly start processing data instead of loading from, e.g., a CSV file every time. MonetDBLite leverages our previous work on mapping database operations into R (now achieved through dplyr in the MonetDB.R package) as well as previous work on ad-hoc user defined functions for MonetDB with R. The talk will introduce the package, demonstrate its installation, and showcase a real-world statistical data analysis on the Home Mortgage Disclosure Act (HMDA) dataset. We show how MonetDBLite compares with its (partial) namesake SQLite and other relational databases. We will demonstrate that for statistical analysis workloads, MonetDBLite easily outperforms these previous systems, effectively allowing analysis of larger datasets on desktop hardware. MonetDBLite has been submitted to CRAN and will hopefully be accepted by useR! 2016.

Day:

1

Embed

Download

Download this episode

The Discussion

Add Your 2 Cents