Sign in to queue


Are you ready for the exploding world of big data? Do you know the difference between Hive and Pig? Do you know why MapReduce is being taught in many universities rather than SQL? If not, pay attention because this talk will help get you started in understanding this new world. While sometimes the Hadoop toolkit (which includes HDFS, MapReduce, Hive, Pig, and Sqoop) is used as an alternative to relational database systems such as SQL Server, more frequently customers are using it as a complementary tool. Sometimes it may be used as an ETL tool or to perform an initial analysis of a freshly acquired data set to determine whether or not it is worth loading into the data warehouse, and sometimes to process massive data sets that are too big to even contemplate loading into all but the very largest data warehouses. In addition to covering the basics of the various parts of the Hadoop stack, this talk will discuss the strengths and weakness of the Hadoop approach compared to that provided by relational database systems and explores how the two technologies can be used productively in conjunction with one another.









The Discussion

  • User profile image

    Interesting talk on the paradigm shift in dealing with big data.

    My partial time annotations in mmss (MinutesSeconds) format are:

    315 some big data stats

    410 amount of data will increase by a factor of 35 to 40 by 2020

    450 the data deluge, G20 interest into big data

    504 why the sudden explosion of interest in big data?

    650 data is not thrown anymore + trend to analyse social network sentiment data

    730 cost of data storage is down

    755 managing "big data": parallel DB vs NoSQL system

    845 Bing statistics

    900 NoSQL discussion

    957 why NoSQL (Not only SQL)?

    1140 NoSQL is driven by developers

    1220 Reducing time to insight explains interest into NoSQL

    1315 NoSQL vs. SQL approach = agile vs. not

    1325 NoSQL approach

    1405 2 types of NoSQL systems:   

    1. key/value: Mongo DB, Couch DB, Cassandra, Azure tables   

    2. 1525 Hadoop = distributed execution framework & file system

    1625 Two universes of data: structured and not

    1705 paradigm shift from SQL

    1840 what is Hadoop?

  • User profile image

    Is this video going to be put up on C9? 


Add Your 2 Cents