Back to Profile: sokhaty


  • Spark Performance Tuning - Part 2

    If I recall correctly, DataFrames were introduced in Spark 1.3 as a "preview", but they are are definitely there in "production" mode in 1.6.

    Also, Spark is not all that great in handling Hive tables with hundreds and thousands of partitions, though it presumably got better with Spark 2.1 release.

  • A tour of F# with Phillip Carter

    @Naveenkumar: It's not an intro for a beginner, that's for sure. But otherwise, it's not bad at all.

  • A tour of F# with Phillip Carter

    The second link to the docs is broken.

  • Big Data Partner Program

    @rustd: Thanks!

  • Big Data Partner Program

    Where are the links to additional resources promised in the video?
    Also, the video needs to be trimmed. The last 5 minutes are just black empty frames (the episode itself ends at 13:something).

  • Introducing U-SQL - A new language for Massive Data processing

    Nice intro! A few questions:

    * how soon are you planning to add support for aggregate/windowing UDFs? On-line docs talk only about row-wise UDOs - user defined operators. Since choice of windowing functions is currently limited, you should at least allow developers writing their own

    * are there plans to provide "local" mode for U-SQL dev and test scenarios (similar to local mode for Hadoop, dev simulators for Netezza and so on)? Perhaps you can add it as part of the Azure SDK with a local emulator.

    * what's the feedback channel for U-SQL?

  • PolyBase in SQL Server 2016

    Besides predicates, what can be pushed down into Hadoop as a YARN job? Joins, single row functions, how about aggregates and window functions? I suppose if you are generating MR jobs for submission it can be pretty much anything, if your query planner is sophisticated enough.

  • Tuesdays with Corey: Nano Coolness with Jeffrey Snover

    Cool, and you guys even figured out how to order French press at Starbucks :)

  • Temporal in SQL Server 2016

    @Davele: If I're reading your post correctly, except for Ex1 Q2, all other cases should be expressible and work correctly in the terms of "now" (irrespective of the temporal tables).

    For query Ex 1 Q2 temporal feature would be a pre-requisite to get the answer.

    Temporal would be really helpful if in the Ex Q1 you were asking, what is the difference between what was known to the system "as of", say, a month back about state of the manufacturing pipeline on July 1st, versus what is known now about the state of the pipeline on July 1st.

    Temporal support is useful in cases when one has to test a forecasting model to see what kind of a decision the model would make, if it had only knowledge about the world as it existed then, vs. decision that would be made know with all the updated and revised facts.

  • HDInsight on Linux

    Cisco uses RedHat to run their UCS-based "cluster in a box" big data appliances. As a smaller s/w vendor we develop everything on CentOS with HDP. For what it worth, support for CentOS would make our life easier, if our next client decides to go with a managed hadoop in the cloud.

    Ubuntu is definitely looks like an easier port than Windows HDP, but its' still an extra platform to target.

  • Cloud Cover 151: Azure Machine Learning with Parmita Mehta

    Looks pretty cool. I'm a bit surprised that there is no code samples in F#. Is it possible to add 3rd party or custom-build components into the tool panel? For example, to add a database connector or custom missing value proxy?

  • State of .NET (Keynote)

    I wonder if 100% managed code database drivers similar to tier 4 JDBC will be ever implemented in .NET. That would be a big relief - no dependency on availability and presence of a native client from a DB vendor and cross platform.