Big Data, Big deal
- Date: February 16, 2012 from 11:05AM to 12:20PM
- Day 1
- 2266
- Speakers: Gert Drapers
- 4,935 Views
- 2 Comments
Loading User Information from Channel 9
Something went wrong getting user information from Channel 9
Loading User Information from MSDN
Something went wrong getting user information from MSDN
Loading Visual Studio Achievements
Something went wrong getting the Visual Studio Achievements
The recording for this session is not yet available
Are you ready for the exploding world of big data? Do you know the difference between Hive and Pig? Do you know why MapReduce is being taught in many universities rather than SQL? If not, pay attention because this talk will help get you started in understanding this new world. While sometimes the Hadoop toolkit (which includes HDFS, MapReduce, Hive, Pig, and Sqoop) is used as an alternative to relational database systems such as SQL Server, more frequently customers are using it as a complementary tool. Sometimes it may be used as an ETL tool or to perform an initial analysis of a freshly acquired data set to determine whether or not it is worth loading into the data warehouse, and sometimes to process massive data sets that are too big to even contemplate loading into all but the very largest data warehouses. In addition to covering the basics of the various parts of the Hadoop stack, this talk will discuss the strengths and weakness of the Hadoop approach compared to that provided by relational database systems and explores how the two technologies can be used productively in conjunction with one another.
Already have a Channel 9 account? Please sign in
Follow the Discussion
Interesting talk on the paradigm shift in dealing with big data.
My partial time annotations in mmss (MinutesSeconds) format are:
315 some big data stats
410 amount of data will increase by a factor of 35 to 40 by 2020
450 the data deluge, G20 interest into big data
504 why the sudden explosion of interest in big data?
650 data is not thrown anymore + trend to analyse social network sentiment data
730 cost of data storage is down
755 managing "big data": parallel DB vs NoSQL system
845 Bing statistics
900 NoSQL discussion
957 why NoSQL (Not only SQL)?
1140 NoSQL is driven by developers
1220 Reducing time to insight explains interest into NoSQL
1315 NoSQL vs. SQL approach = agile vs. not
1325 NoSQL approach
1405 2 types of NoSQL systems:
1. key/value: Mongo DB, Couch DB, Cassandra, Azure tables
2. 1525 Hadoop = distributed execution framework & file system
1625 Two universes of data: structured and not
1705 paradigm shift from SQL
1840 what is Hadoop?
Is this video going to be put up on C9?
Remove this comment
Remove this thread
close