vvldb vvldb

Niner since 2006


  • Stephan T. Lavavej: Digging into C++ Technical Report 1 (TR1)


    I really don't know where to start and this will be long and rough and rant-like.. But I am only trying to help see the next hop and point to some very simple facts about C++ object model vs GC.

    I am certainly not phased by any talk of GC, and especially in the context of .NET.

    The committee should rapidly back off the idea of introducing one by default.

    I have worked with VMs for over a decade and can see why they will never be appropriate for a number of tasks that are becoming so important it will irreversibly differentiate value vs reference paradigms. Within couple of years it will also define the 'what now' ie. managed 'upgrade-mentality suicide' story ending.

    I am aiming at the next phase widely accepted as the demise of Murphy's hypothesis.

    Before going on with sarcasm and one tiny detail, I would first like to congratulate the VC++ team for coming out of the trenches after a decade of starvation and Microsoft's reinvention-in-Java, and investment in .NET (primarily marketing and poor language design ie. C# generics, .NET collections, LINQ abomination of 'thinking in sets', aka Bruce-Event-Lee De-lphins and plenty more to bother with listing here).

    On a more technical note, avalanche destructors, interlocked ref-counting and many other criticisms of 'all for GC'-style are going against the nature of all modern hardware and language constructs of a tool that caters for multi-paradigm and built-in expression power to bypass any deficiency presented. All of them are easily catered for by language constructs and compiler extension (and you'll see concepts play a huge role here). That's the next phase and you better follow boost people for the next step in this evolution rather than blogs about .NET.

    Once .NET people (especially those who converted to productivity candy just like many of us did back in 1996 with Java and 2000 with .NET) finally get disappointed and realise the advance is not going to happen in VMs but environments that can be catered for without breaking threading, memory, ownership, non-ownership and many other models that we can go back and discuss :

    Why GC will eventually be slaughtered?


    Meanwile, the best .NET libraries out there are consistently reverting to native code to make up for all the mess GC and runtime environments are very fond of (not to mention first-language education damage in managed-land).

    Another comment was on proving type-safety, you know, the proof. Please look at MSR and other projects that do better code analysis than any bytecode tool is capable of today. Actual samples are right there, today, and that's for C Driver Code. No hmm?

    Okay, please don't look puzzled (I know it is hard for a die-hard managed mentality), just boot your WPF app or pump a large dataset into your WinForm or ASP.NET app and you'll see the catastrophe that will soon scale as bad as JavaScript without any threading notion at all.

    It will probably hurt to realise this, Google is blasting away with this concept and has pretty bad C++ programmers judging by their blogs. But they try hard where it matters. It is just a fact you cannot avoid, and when you see it and fight trying to achieve same with .NET for over 5 years with memory barriers, with help of IOCP, with help of everything and fail, than you wonder:

    Isn’t it about time to move on and make a better environment?

    One without runtime overhead?

    That is the genius of Bjarne, and C and .NET and Java guys better learn fast as the next barrier and clean-up, and rewrite is quite near.

    I've been waiting for .NET and Java hype to materialise for next generation computing for a decade now. And it is constantly disappointing with abstractions that leak and diverge from reality.

    Point of no return has been passed though, aka memory latency kills.

    I mean is it so hard for people to realise all those VMs are written in pretty average C++, and by induction it satisfies all models you are currently working with (including simple/managed/.NET, OO,  functional, parallel, you name it).

    Why is it so hard to get this I wonder?

    Any comment against C++ or Boost or TR1 is a suicide for your next version of VM, and I have learned to immediately find all of them suspect, even if they are backed up by some surface-level research. The same fact applies to GC protagonists.

    Please see that C++ model is what is underneath you and it is advancing at the pace you will not be able to ignore if you care about your work, which VM guys consistently show they don't as they complete their work and boast about how easy it all was.

    If history teaches anything, nothing too easy was ever good enough.

    And for anti-interlocked and anti-ref-counter guys, no one is forcing you to do anything like it, and you can easily bypass it and blast any GC or VM:

    Use const on everything!

    Something modern VM and language 'heroes happen there' folks couldn't accept was the ultimate solution.

    Go back and read what Bjarne and Sutter are doing. It will help you beat all the .NET, Java, AOP, declarative, Erlang and Haskell people with a blink once you dive in.

    You'll be capable of building generations of frameworks (if you have to, but that's not the goal), not just use a single one that is so inefficient in expressing ultimate machine abstraction:


    And it is evolving too, in parallel to C++, just like your hardware guys. PFX or PLINQ or similar will not help you here.

    And none of this is relevant to languages per se, as many follow similar syntax and even translatable semantic in at least one direction. The point is the managed world cannot satisfy some basic models and is starting to break down rapidly.

  • Conversation with scientist, engineer and database legend Jim Gray

    Will be a nonsense post from me as usual but perhaps something useful for my record (I keep this list close-by) of 'Top 10, pointless, time-wasting failures' while developing software..

    It took a while, can only imagine how busy everyone over there is, but reading all the Qs and replies made my day and presents an interesting flow of reasoning perhaps.  And at least I can show off now, point to this site as evidence when some 'silly' argument develops in  pointilistic-culture-friendly company all my collegues work in Smiley

    Thanks very much for your time Dr. Gray and for those little hints that keep the mind hungry and, how to put it, just sweet enough to question everything, try something different all the time and hopefully invoke change and more.

    On parallelism, it seems a lot of new and old MSFT VS guys seem to be occupied enough with it for us to anticipate that is where the next edge will be for quite some time now (apart from other VM work I see on this site but have no time to view videos of or read about). Might seem a no-brainer to many (they say ignorance is a bliss, but competativness on milliseconds timescales isn't one Wink, and to write code (or to have an intuitive-enough platform) for such environment is surely couple of orders of magnitude harder to build than just looking to avoid deadlocks or selecting locking granularity that might be optimal for some application. It's like, err, having a terrifyingly humble computing legend around, not many people about that can be and remain that way Wink Thus quite likely not many people will program parallel (including myself) for a long time either.

    So what's left.. personally, I don't believe much in generalisations or sticking to a single environment for that matter. Reason probably being our ideas change allthe time or that everyone is psd off with everyone else, it is only natural to be moody and try 'your own thing' TM. What strikes me in this day and age, or wisening process I wish/hope, is to make an advance no matter what the method, utilise it and make the money before someone else does the same. This in turn ideally envokes the change for greater good, like health, like eliminating poverty and more (enter notes on Balmer and Bill which were all enlightening and beautiful read at the very least).

    We perhaps don't need an environment initially, just constructs to show the results and the rest (any generalisation if there is a need for them) should appear from it. The good old 'responsibility pushed to the user'. What I am aiming at (after some layman-type  rumblings with myself) is a thought that in order for parallelism to really work is to have benefit from it, timely and more accurate data; all the software or platforms etc in the chain, or all interaction, are ideally designed with it in mind. Thus I find it is no surprise to see very little commercial bits available, esp where money is milliseconds, and even when some are found (like was it cambridge STL bits I believe), they are useless because there is an integration with a system unable to benefit from it.

    Enter queues. But also enter the fact that no generalisation can be found for multi-writer, multi-reader (thread is the wrong term, parallel suits better I guess) scenarios. That + logic to assemble it all in such fashion that the 'serial' workhorse (historical-term:Viper extension) might get its logic (our apps) work done on just the dataset that is relevant, ie. the latest 'shapshot' is nowhere around.

    I like to draw parallels to real-life (ie. no machines exist) all the time; information that is old is no longer useful for a good number of new ideologies or applications, about time someone deals with it yet it is almost a revelation in 2006, say Windows Mobile 2005 devices picking up email. As an example, at this time I am watching Intel's stock break out after a terrifying blow for almost 2 years now. What happened there is anyone's guess but my snapshot is only interested in whatever the arbitration logic decides is relevant at a point in time it chews the input, not necessarily the past (ie. almosty like queue filtering with some context data to help keep the search/sort operations from parallel input down or 'temporalised' ). Sure the workhorse (inner loop) can utilise further 'tool or software helped' parallel processing if it is built that way.

    The rest in my mind has little benefit from parallel execution we like it or not and sure, algorithms and software tools can identify such things but will it work all the time, ie are these bits inner loops, and what headache will it give to software developer? Your comment on one-trick ponnies seems to be on similar lines (unless I am terribly confused, which isn't an issue here Smiley

    Much the same the hardware guys like DEC kicked off and those bits sold to was it Compaq then Intel etc, ie. Alpha architecture or at least ideas from it. Thus, I believe all we need is some good hardware abstractions in forms of MWMR queues, extension points for our state/temporal/filtering logic, and (hate the term btw) 'pushed' and 'high-res versioned' data to help us build those. Sounds easy in this quick nonsense write up of mind, but probably hard enough to even begin with for most. Form-type, history and other 'audit-friendly' apps can wait, they had priority for far too long and they caused no revolution apart from WWW, blogs and Google, which is not small but no Holy Grail either Smiley

    In any case, I gave up on big data storage (and compression of such) idea as reliability bits came into play (off goes the Morse&Isaac,Snodgrass etc out of my short lifespan:). Life gives such incredible reasons to not be obsessed with detail or clutter, yet all programmers fall for and fall in love with it.

    Many thanks for your reply and time.

    'Optimisation is the root of all good' convert

  • Conversation with scientist, engineer and database legend Jim Gray

    I better be quick before someone beats me to it Wink

    Good heavens, never thought I'd be reminded by the man himself to clean the dust of that timeless piece.. let me fix my face, the jaw has gone pretty low here Wink

    Dear Dr. Gray,

    Could never understand how deep all the transactional science can get (until I picked up that incredibly detailed work and lost myself pretty soon), all while it is so abstracted we never see much of it in day to day work or in different, simplified models.. which is greatness no doubt.  So now given myself the task to get that data structure implemented, utilised as well as look for some good testing of few SQL server batteries..

    While my lousy opinion is that AJAX is not going to fire away anywhere fast or too succesful (before it is replaced with another name and method at least;-), I believe I  can see where the 'regret' hint is coming from looking at how far ahead MS was back then (and how long it takes for things out of research to resurface in commercial world)..  The issue seems that VML and RDS if that was the correct name and far more were just too much for web pages back in those days. Broadband wasn't taken up as widely,  machines were far slower and storage was still expensive.. Anyway, I just believe that HTML is still slow for highly interactive apps and rendering engines just don't seem to scale with number of visible or out-of-viewport elements.. enter Java hacks etc.. don't see devices coping with much of it either but sure things are getting better slowly.

    My favourite comment on the show was on the heat problem as my own teacher always insisted it will have to be hit and pretty soon (his estimates were something like c2010 back in 1997). He always said that's exactly when the algorithm guys (and researchers as he led that department) will finally 'take over' and see great satisfaction and demand for the work they did/do.. Just thought of mentioning it in the context of something most of us mere mortals will never experience or see Wink

    To not bore anyone any more, I guess it is a common requirement today to process huge amounts of data (I guess a need for it to be compacted too but another topic I guess). I want to stick to SQL Server (if for nothing else than for many thing said, shown here and the awesome interview).. Sure and for performance, tools and more. And now I'm hitting SQL hard as I can, transaction logs go in 1GB increments in space of a minute and I'm expecting real-world scenario to push that far higher.. therefore anything bound to a single machine, single point is out of question.. easy to say, hard to implement especially as low latency query is a major requirement for the project. I avoid DTC as much as I can and almost always get away (I think;), sure all specific to a problem etc. So ok I now get it has to scale out, it better do, and it has to be distributed because of locking nature of loading large datasets efficiently..

    I was always thinking replication and versioning approaches were a way forward for such scenarios.. cache approach I don't have much desire for, IMDB was dumped before Wink, addressable space will keep growing etc.. hence looking for insight from let's face it..

    Ok, gotta do this, someone point me where in the world could you publicly ask for advice from a Turing Award legend..

    Kudos, am not worth a reply, this site rocks, TM, LTD, etc

    All the best.