Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

Download

Right click “Save as…”

  • Mid Quality WMV (Lo-band, Mobile)
  • MP3 (Audio only)
  • MP4 (iPhone, Android)
  • High Quality MP4 (iPad, PC, Xbox)
  • Mid Quality MP4 (Windows Phone, HTML5, iPhone)
  • WMV (WMV Video)
Burton Smith is a Technical Fellow at Microsoft who thinks about ways in which our platform needs to be structured to support general purpose computers that will soon have clustered super computer processing power as we move closer to manycore everywhere (not too far off into the future...). Burton is a parallel computing expert, an industry thought leader in high performance, massively parallel distributed (aka super) computing. Winner of the Seymour Cray Computer Engineering Award, Burton knows a thing or two about how to architect and implement software systems that can succeed in the Age of Manycore.

This is a long and great conversation, unedited of course. You'll want to make some time for this and listen carefully to what Burton says. This is a very important general introduction to parallelism and high performance computing. As always, we can't talk about super computing without addressing program language evolution in the context of manycore (you've seen this quite a bit on C9 over the years). We cover a lot of ground here including Burton's insights into functional programming, transactions, compatability, shared mutable state, operating systems, technical redunancy and the role of Technical Fellows in the post-Bill era.

Enjoy this great introduction to parallelism and the future of our platform technologies and tools as we head into the age of manycore. This is the first in a series of several interviews covering parallel computing and Microsoft's Parallel Computing Platform technologies, specifically.

Low res file for the bandwidth-challenged.

Tags:

Follow the Discussion

  • evildictaitorevildictait​or Devil's advocate
    I'm such a pedant, but "Low res file for the bandwidth challenged" means that the file was challenged, not that the people are bandwidth-challenged. You need a hyphen or the entire sentence means something else. (I sat here for a minute wondering why the hell someone would challenge the file)
  • CharlesCharles Welcome Change
    evildictaitor wrote:
    I'm such a pedant, but "Low res file for the bandwidth challenged" means that the file was challenged, not that the people are bandwidth-challenged. You need a hyphen or the entire sentence means something else. (I sat here for a minute wondering why the hell someone would challenge the file)


    Fixed. Now, watch the interview. Smiley
    C
  • You might want to run a spell checker over the accompanying text, hint "clustered" Wink

    Great video though!
  • I think most of us managed to get the meaning. Anyhow - excellent video! Smiley
  • Good to hear from a fellow HPC geek;)

    I'm not sure if silos have to be eliminated in quite the way that Burton mentions. Sometimes (in my experience most the time) one group needs a component before another group will be ready to deliver it, even if they have the "winning solution". Say for example the winner is in the middle of a reliablity scrum so they aren't going to be coding anything for the next 1-2 weeks. That will push any component of any meaningful size off for at least a month.

    In my view it isn't so much picking the winner up front, but getting a rough API design and letting whoever needs it first code it. Then everyone needs to know that feature has been made so when they run into a need for it they can "cut and paste" it into their app. Otherwise what happens is you pick a winner and the other projects get a stop sign placed on their Gantt charts wherever the point at which the project leaders can deliver that component.

    I would like to see a little bit of what I'd like to term "architect in a cloud" where an architect isn't tasked to a particular product group but is free to roam (Bill Gates probably has demonstrated that role best in the past). That way if they have the winning idea but another group than the one they currently are in has spare dev resources or an earlier dependance on the component the architect can float on over to that group and lead the project rather than wait for the resources to be free in his department or steal the resources from another department.

    Two arguements for it:

    1) It is easier to move one person then a dev team.
    2) Architects spending more time circulating about will have a deeper grasp of the company and a better idea of who the "thought leaders" are in a bunch of different niches.
    3) Components that are depended on by multiple projects get started earlier.
  • evildictaitorevildictait​or Devil's advocate
    Just got round to watching it, and I'm impressed. Burton mentions a lot of very important things (like that functional programs can be easilly serialized) and goes into a lot of depth here about why functional-geeks care about functional languages.

    Can I just help to clarify for some people - when Burton goes on about everything in functional languages being constant and making things out of these constants, a much better way of putting this is that are composed of functions which are constant with respect to their arguments
    , and this allows for more benefits than just compile (and just-in-time) compilation for parallelisation - it also has serious impacts as to how you compute them, so it's even more important than he was suggesting.

    Anyway, this was an excellent video, and I hope to see more where they came from in the near future! Good work Burton! (and Charles, yeah. shouldn't forget him Tongue Out)
  • Great video! Reassuring! He seems to really "get it" (IMO) so it's nice to know that someone like that is walking the halls and nudging things in the right direction.
  • SQL is a fine example of a functional language that most of us are used to already... but even can become complex when considering a database getting hit by many users simulaneously with row/table locks and buffering modes. It doesn't really solve the race condition that crops up in parallel computing.
  • evildictaitorevildictait​or Devil's advocate
    Dark_Halmut wrote:
    SQL is a fine example of a functional language that most of us are used to already... but even can become complex when considering a database getting hit by many users simulaneously with row/table locks and buffering modes. It doesn't really solve the race condition that crops up in parallel computing.


    The bit where the locks are is where the functional bit (SQL) is put down in state - i.e. where SQL stops being functional (storing things is a non-functional thing to do). Of course, if we didn't do that bit, it would never bother to store the data (which would suck as databases go). If your database was immutable (i.e. you never wrote to it), SQL would be purely functional, and achieves massive parrallelism.
  • Now the spelling is ok, the download link doesn't work. Perplexed

  • Vesuviusvesuvius Count Orlock
    So Excel spreadsheets/SQL are functional programing. Indisputable, but I'm ashamed to admit that I never thought of it that way.

    Burton has been around for about 2 years, how many other Technical Fellows have not been on Channel 9 (consentual of course)? Please try and get as many, irrespective of their field for a session. The richness and depth of experience these 'Fellows' have means hearing from them is like receiving 'Manna From Heaven'.


    It's good to hear from Microsofties, and non-Microsofties like Gilhad Bracha for equilibrium. Bravo!
  • evildictaitor wrote:
    If your database was immutable (i.e. you never wrote to it), SQL would be purely functional, and achieves massive parallelism.

    And be nowhere near as useful as a database where you could change data.
  • CharlesCharles Welcome Change
    klinkby wrote:
    

    Now the spelling is ok, the download link doesn't work.



    The link is not broken.
    C
  • It was super pleasure to listen the thoughts & techniques and the way things have evolved in the Parallel WORLD.....by Burton Smith..
    Especially about how we can make Functional languages to truly evolve by adding sort of Transaction thing...( which has been well done in data world )

    It was super great ...& just put his name in ur pocket Charles, so that we can have more of him in future...and may be if u can find more of architects ..., then it will be super awesome..

    F# & talk about Don Syme is also nice ..., who built generics for .NET
    F# is really super cool language and i have started learning it, its just awesome and as a Programmer you really get to learn something that really develops you..in back...

    Once again thx to Burton Smith for such an awesome talk..

    cheers Charles..

  • The interview is memorable because of Burton's ability to express complex issues in a simple way. Like others, I like the linking of Excel/SQL/defered constants to functional programming.
  • So Charles, just why is David Cutler so reluctant to appear on C9? Are we so monstrous? Wink

    Also, would it be possible to include links to papers or people referred to in your interviews? I often find myself hitting pause and web-grepping for the resource in question, it would be great to have them up front.

  • gaurav.net wrote:
    
    Especially about how we can make Functional languages to truly evolve by adding sort of Transaction thing...( which has been well done in data world )


    You may be interested to hear that Haskell has had transactions since 2005. In fact, there's a channel 9 video about them with Simon Peyton Jones and Tim Harris (though it's more general, and not specifically about their Haskell implementation).

    And yes, it does indeed work very well. For most cases you would try data parallelism, if that doesn't work you'd try task parallelism, if that doesn't work you would threads and message passing, and if that doesn't work either you could try using shared memory and then the transactions are really helpful. They round off the "portfolio" of strategies to use when doing parallel programming nicely, as there are in practice a few things where we really need a large shared data set that can be modified by thousands of agents at once without having to do course grained locking as that would cripple performance (consider a game, for example).
  • William Staceystaceyw Before C# there was darkness...
    Nice.  Thanks C and Burton.  Top video.
  • sylvan wrote:
    
    gaurav.net wrote:
    
    Especially about how we can make Functional languages to truly evolve by adding sort of Transaction thing...( which has been well done in data world )


    You may be interested to hear that Haskell has had transactions since 2005. In fact, there's a channel 9 video about them with Simon Peyton Jones and Tim Harris (though it's more general, and not specifically about their Haskell implementation).


    Just checking out that video of Peyton Jones and Tim Harris...
    Thx for the info...

    This is very interesting and the languages like F# also popping up...
  • RichardRudekRichardRudek So what do you expect for nothin'... :P
    Excellent nothing. That was a brilliant video.

    Thank you Burton, and thank you Charles.


    PS: As for Dave Cutler. I certainly don't want to disrespect him in any way, but my mischievous side just can't resist: My take is that he spends a lot of his time in the primitive world of the Operating System Kernel, etc, and has thus adopted primitive beliefs about Cameras/Pictures stealing people's souls... Either that or he can't trust himself to keep a secret...[A]


  • CharlesCharles Welcome Change

    Glad you liked this conversation. Burton's a very friendly and engaging person.

    Re: Cutler, he's just not interested in this type of thing (being "interviewed" - these are conversations, really, not interviews per se...). There's nothing more to it...

    I'll try and provide more links to content mentioned in conversations going forward. Thanks for the feedback.

    C

  • Hi Charles,

    Thanks to you and Burton Smith for the conversation. I would like to hear more about anything at microsoft having to do with isolation via message passing and capability security, similar to the E language:

    http://www.erights.org/

    I know we have CCR, but the syntax is clumsy, messaging does not have first class status, nor is there any isolation. What kind of similar things to E are happening at Microsoft, if any?

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.

    Thanks,
    Frank
  • Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.

    I agree that message passing is ideal when suitable, but sometimes it just isn't. Also, while threads are sometimes excellent abstractions (even for things where you *don't* care about concurrency they may be the right model), sometimes they just suck. They basically suffer from the same problem that "goto" does: obfuscation of the structure of the program (you have to jump back and forth through messages in different threads, which one may not be known until you run the program, to understand what the program does).

    Now I freely confess that I hadn't seen E (I'll look into it now) so if they give some new cool abstraction that solves all of this I'll recant my statements, but for now I'll say this: there is no one solution, we need multiple solutions to the problem of parallelism/concurrency. In my book the bara minimum is: Nested data parallelism, task-based (purely functional) parallelism, threads with messages, and threads with shared state (here's where you need transactions).
  • William Staceystaceyw Before C# there was darkness...
    Frank Hileman wrote:
    Hi Charles,

    Thanks to you and Burton Smith for the conversation. I would like to hear more about anything at microsoft having to do with isolation via message passing and capability security, similar to the E language:

    http://www.erights.org/

    I know we have CCR, but the syntax is clumsy, messaging does not have first class status, nor is there any isolation. What kind of similar things to E are happening at Microsoft, if any?

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.

    Thanks,
    Frank


    I have come to same conclusion.  Looked at some of that e stuff.  They have the right idea, but looks very confusing. They make up a lot of uneeded terminology.  It needs to be much simplier then that and can be.  The abstraction has to be correct by construction and simple and has to be explicit opt-out model (ala unsafe) me thinks.
  • William Staceystaceyw Before C# there was darkness...
    sylvan wrote:
    
    Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.
    ...


    I see what your saying, but lets analyze that.  Essentially, that is a classic ReaderWriter lock sample.  You need the sync, because you never know who the last action was, a reader or writer.  So you still need some kind of sync primitive to protect the invariants.

    One may say, lets keep the message queue for writers, but let readers read properties directly. But here we have couple issues.  You still need lock to insure atomic reads (i.e non cache and non torn).  Second, you have possible coordination issues the runtime could never know and only your logic knows.  For example, your object may intentionally not be popping the queue (sort of a working blocking operation) queue until it completes work for last task - as maybe order is important or maybe is waiting on various replies.  There is all kinds of reasons order could be important and you can't reliably short circut that in general case.  So only the Object knows best.  As far as Perf goes, yes the message passing has some cost.  But so do ReaderWriters and Monitors.  When you look at the "net" performance, I think it probably is a wash or better with message passing.  Costs such as complexity and correctness proof are much less from the start.  There is many other benefits you can't get with locks or even trans memory.  You can get order symatics, self messaging, throttling, naturally async, pipelining, natural composition, loose binding, and others.

    Think Juval Lowy nailed it here. Every class needs to be a service:
    http://channel9.msdn.com/ShowPost.aspx?PostID=349561#349561
  • staceyw wrote:
    
    I see what your saying, but lets analyze that.  Essentially, that is a classic ReaderWriter lock sample.  You need the sync, because you never know who the last action was, a reader or writer.  So you still need some kind of sync primitive to protect the invariants.

    One may say, lets keep the message queue for writers, but let readers read properties directly. But here we have couple issues.  You still need lock to insure atomic reads (i.e non cache and non torn).  Second, you have possible coordination issues the runtime could never know and only your logic knows.  For example, your object may intentionally not be popping the queue (sort of a working blocking operation) queue until it completes work for last task - as maybe order is important or maybe is waiting on various replies.  There is all kinds of reasons order could be important and you can't reliably short circut that in general case.  So only the Object knows best.  As far as Perf goes, yes the message passing has some cost.  But so do ReaderWriters and Monitors.  When you look at the "net" performance, I think it probably is a wash or better with message passing.  Costs such as complexity and correctness proof are much less from the start.  There is many other benefits you can't get with locks or even trans memory.  You can get order symatics, self messaging, throttling, naturally async, pipelining, natural composition, loose binding, and others.

    Think Juval Lowy nailed it here. Every class needs to be a service:
    http://channel9.msdn.com/ShowPost.aspx?PostID=349561#349561


    The problem is that the action you want to do may be "read value, do lots of complicated logic that takes a fairly long time, then you may decide to either do nothing, or update the value". If you're going to have a single owner of the data that handles this as a service (with an ad-hoc transactional protocol) you'll get contention. Let's for the sake of argument say that the computation that needs to be done before we know if we need to update takes a full second and you'll see how performance would be dreadful with even a few dozen threads accessing this data (even if actual updates are highly unlikely).

    You can't just separate reading/writing into two separate requests either because you *need* transactional semantics. You need to know for sure that you read a value, then you decide to update, and the value won't have changed. With message passing that basically means "one client at a time please". I.e. horrible contention. You could do the equivalent of the usual locks/monitors business but that's what we were trying to get away from in the first place (they don't compose, and they're a nightmare to get right under complicated - i.e. realistic - scenarios where the set of locks you need to take depend on computations you can only do once you've already taken a bunch of other locks)!

    With transactions the operations on the objects would still be owned by the objects (because it "knows best"), but the actual code would run on the caller's thread, meaning that all your clients can run at the same time and in 99.9999% of the cases there is zero contention, and everything scales almost linearly with the number of cores, and we're happy.

    I agree that totally isolated threads are a better than shared state threads for correctness (but if all you want is parallelism, then tasks are even better, and data parallelism is even better than that), but there are *many* real world scenarios where shared state is crucial. Only having message passing is a non-starter for general purpose programming. We NEED a practical and composable (i.e. not locks) way of synchronising access to shared mutable state. Transactions is the only technology I'm aware of that does this currently.


    EDIT: Oh, and consider the problem of granularity. Take the example of a game. You're running a thread for an AI character who wants to perform an action on the game world atomically. How do you solve that? Do you ask the World object to retrieve whatever it is the AI wants to access? In other words is the World object's service going to act as basically having one big lock on it for any atomic updates that AI characters want to do? Or do you put the implicit "lock" somewhere further down in the hierarchy (e.g. on individual objects in the world)? If so, how is this different from just having locks on the individual objects? How do you solve the problem of not knowing up front which objects you want to modify (because the set of objects you need depends on the values you find in the first couple of objects you examine)? Aren't we back in the old hornets nest of locks and condition variables again?
    So basically we either have to let the toplevel object provide a transactional interface (amounting to a big implicit lock), and basically eliminate any chances of parallelism, or we go back to horribly impractical fine grained locking with all the problems that poses. In this scenario, message passing gives no improvement to us, whereas transactional memory "just works" without any synchronisation burden placed on the programmer at all!
  • William Staceystaceyw Before C# there was darkness...
    [quote user="sylvan"]
    staceyw wrote:
    
    I see what your saying, but lets analyze that.  Essentially, that is a classic ReaderWriter lock sample.  You need the sync, because you never know who the last action was, a reader or writer.  So you still need some kind of sync primitive to protect the invariants.

    One may say, lets keep the message queue for writers, but let readers read properties directly. But here we have couple issues.  You still need lock to insure atomic reads (i.e non cache and non torn).  Second, you have possible coordination issues the runtime could never know and only your logic knows.  For example, your object may intentionally not be popping the queue (sort of a working blocking operation) queue until it completes work for last task - as maybe order is important or maybe is waiting on various replies.  There is all kinds of reasons order could be important and you can't reliably short circut that in general case.  So only the Object knows best.  As far as Perf goes, yes the message passing has some cost.  But so do ReaderWriters and Monitors.  When you look at the "net" performance, I think it probably is a wash or better with message passing.  Costs such as complexity and correctness proof are much less from the start.  There is many other benefits you can't get with locks or even trans memory.  You can get order symatics, self messaging, throttling, naturally async, pipelining, natural composition, loose binding, and others.

    Think Juval Lowy nailed it here. Every class needs to be a service:
    http://channel9.msdn.com/ShowPost.aspx?PostID=349561#349561



    The problem is that the action you want to do may be "read value, do lots of complicated logic that takes a fairly long time, then you may decide to either do nothing, or update the value". If you're going to have a single owner of the data that handles this as a service (with an ad-hoc transactional protocol) you'll get contention. Let's for the sake of argument say that the computation that needs to be done before we know if we need to update takes a full second and you'll see how performance would be dreadful with even a few dozen threads accessing this data (even if actual updates are highly unlikely).

    You can't just separate reading/writing into two separate requests either because you *need* transactional semantics. You need to know for sure that you read a value, then you decide to update, and the value won't have changed. With message passing that basically means "one client at a time please". I.e. horrible contention. You could do the equivalent of the usual locks/monitors business but that's what we were trying to get away from in the first place (they don't compose, and they're a nightmare to get right under complicated - i.e. realistic - scenarios where the set of locks you need to take depend on computations you can only do once you've already taken a bunch of other locks)!

    With transactions the operations on the objects would still be owned by the objects (because it "knows best"), but the actual code would run on the caller's thread, meaning that all your clients can run at the same time and in 99.9999% of the cases there is zero contention, and everything scales almost linearly with the number of cores, and we're happy.

    I agree that totally isolated threads are a better than shared state threads for correctness (but if all you want is parallelism, then tasks are even better, and data parallelism is even better than that), but there are *many* real world scenarios where shared state is crucial. Only having message passing is a non-starter for general purpose programming. We NEED a practical and composable (i.e. not locks) way of synchronising access to shared mutable state. Transactions is the only technology I'm aware of that does this currently.

    ...quote]


    I agree transactions are very important abstraction here.  And they need to work with db transactions also. So there is some interesting work there.  I am not sure STM is required to pull this off.  The new runtime/language abstractions could do this by wrapping access to classes with proper behaviors.  I would think we need optimistic concurrency transactions so remote and local behavior could be same abstraction.

    class MyClass
    {
       string Name;
       int Count;
    }

    MyClass mc = ...
    MyClass mc2 = ...

    tryTransaction(mc,mc2) // OCC trans started on mc,mc2. mc's inside block are a copy.
    {
       mc.Name = "joe";
       mc2.Count++;    
    } // Commit. Runtime updates mc in batch and uses memory barrier to flush writes. Skipped if no writes.
    catch(Exception ex)
    {
      // trans is rolled back.
    }

    mc.Count++;       // Error. Writes must be inside OCC trans.
    int i = mc.Count; // Reads ok outside trans.

    TMK, expensive memory barriers would still be needed, but still cheaper then contenting explicit locks.  However, at lease the runtime does this for us and could dynamic pick the fastest way (i.e. Interlocks, Thread.MemoryBarrier, etc).  They could push this down even farther in the stack by changing the clr memory model somehow to address this.

  • staceyw wrote:
    

    I agree transactions are very important abstraction here.  And they need to work with db transactions also. So there is some interesting work there.  I am not sure STM is required to pull this off.  The new runtime/language abstractions could do this by wrapping access to classes with proper behaviors.  I would think we need optimistic concurrency transactions so remote and local behavior could be same abstraction.

    class MyClass
    {
       string Name;
       int Count;
    }

    MyClass mc = ...
    MyClass mc2 = ...

    tryTransaction(mc,mc2) // OCC trans started on mc,mc2. mc's inside block are a copy.
    {
       mc.Name = "joe";
       mc2.Count++;    
    } // Commit. Runtime updates mc in batch and uses memory barrier to flush writes. Skipped if no writes.
    catch(Exception ex)
    {
      // trans is rolled back.
    }

    mc.Count++;       // Error. Writes must be inside OCC trans.
    int i = mc.Count; // Reads ok outside trans.

    TMK, expensive memory barriers would still be needed, but still cheaper then contenting explicit locks.  However, at lease the runtime does this for us and could dynamic pick the fastest way (i.e. Interlocks, Thread.MemoryBarrier, etc).  They could push this down even farther in the stack by changing the clr memory model somehow to address this.




    Isn't this system just a less flexible version of STM? How is this different from STM, aside from not supporting "retry" and "orelse"? I assume you don't really want to specify the objects that you keep track of up front either (since that brings us back to the original problem where we don't really know what variables we need to update until we've started the transaction).
    Also, don't know why reads need to be okay outside transactions, since you could easily just do a single-statement transaction doing the read.


  • William Staceystaceyw Before C# there was darkness...

    "Isn't this system just a less flexible version of STM? How is this different from STM, aside from not supporting "retry" and "orelse"?"

    TMK, STM is still an open research problem (at least from MS perspective). IMO, STM proper is more an implementation detail, I am still at the Abstraction level here. As Burton says, STM is the big hammer. It tries to solve the larger, more general problem for all memory access. Maybe it is used below the abstraction (if it ever gets done), or maybe not. But, in concert with the compiler and runtime I think they can pick out the core ideas and make it work now with combo of language, compiler, and runtime.

    Method above would also support orElse and maybe condition variables at top of block.

    tryTransaction(class)
    {
      // stuff.
    }
    orElse
    {
      // other stuff.
    }
    orElse
    {
       retry 5; // retry 5 times or fail. TM keeps count down.
    }
    catch
    {
    }

    My experiment here is that we handle invariants at the *class level, not the variable level as such. There is no change log, there is only the original copy of class(s) at the start of transaction stored by the TM. So all invariants of the class are always true for Readers and you can't get "temporary" inconsistent state inside transactions as shown in http://en.wikipedia.org/wiki/Memory_transactions in "Implementation issues". If a writer transaction fails, the Original copy is but back atomically. A writer is always first a reader of the class(s) in the transaction.

    "I assume you don't really want to specify the objects that you keep track of up front either (since that brings us back to the original problem where we don't really know what variables we need to update until we've started the transaction)."

    As above, this copies the object upfront. And all changes are applied or none of them are.

    "Also, don't know why reads need to be okay outside transactions, since you could easily just do a single-statement transaction doing the read."

    You can do reads inside a trans. In fact, that would be the only way to ensure consistent reads of all the members of a class as you get a "snapshot" of the class. However single reads outside (i.e. Length, Count, etc) could be allowed as a transaction does impose overhead that may not be needed in some cases such as just reading current state for display purposes (i.e. Progress). That said, it may be too easy to shoot the foot if this was allowed. Not sure on that one.

    Interesting topic. Hope they are having fun.

  • staceyw wrote:
    

    As above, this copies the object upfront. And all changes are applied or none of them are.

    Well that would be the deal breaker then. It seems to me that many of these transactions would have a "potential" set of affected objects that is much larger than the actual set for any given transaction (as the transaction would be likely to have control flow in it). So the overhead of copying objects you didn't actually need, and copying objects back that you didn't actually write to (which is what I assume you're talking about, as keeping track of the objects you actually write to amounts to basically a transaction log, and then why not just use STM?) would be very high indeed.

    So again this is only a tiny improvement on locks really, since a transaction that commits would cause all other in-progress transactions to be aborted, even if they don't actually overlap (which is similar to how taking all the locks you may need causes contention even in cases where there are no actual conflicts). My intution tells me that for "real" applications (i.e. the ones where locks and monitors simply aren't tractable) many transactions would suffer from this.

    So again, I think we need a solution where you can read and write to variables however you want, and have the conflicts resolved only when they actually occur, rather than doing some sort of conservative estimate. This key IMO. Almdahl's requires us to be very careful about not artificially limiting the amount of parallelism we can get by doing these kinds of conservative estimations (because if we do so, then that will quickly become our bottle neck when the number of threads increases).

  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    staceyw wrote:
    

    As above, this copies the object upfront. And all changes are applied or none of them are.

    Well that would be the deal breaker then. It seems to me that many of these transactions would have a "potential" set of affected objects that is much larger than the actual set for any given transaction.

    Who says you copy it before it's used. Surely you'd only copy an object at the point where you know you're using it (because you need it now). You're copying it before it's used, but you're not copying things you don't need to copy.
  • evildictaitor wrote:
    
    sylvan wrote:
    
    staceyw wrote:
    

    As above, this copies the object upfront. And all changes are applied or none of them are.

    Well that would be the deal breaker then. It seems to me that many of these transactions would have a "potential" set of affected objects that is much larger than the actual set for any given transaction.

    Who says you copy it before it's used. Surely you'd only copy an object at the point where you know you're using it (because you need it now). You're copying it before it's used, but you're not copying things you don't need to copy.


    Well, staceyw said you copy it before it's used, for one. Doesn't really matter though, the main problem is that transactions would be rolled back when another transaction commits, just because their potential set of objects overlap (even if they don't actually interfere). Or if you do check for actual overlap before rolling back, then you lose isolation since a potentially overlapping transaction can survive the commit (you can read variable X before a transaction commits a change to X and Y, and then you read Y afterwards. But now the X and Y that you've read aren't consistent anymore, since X is from before the last change, and Y is from after), how would you solve that without a transaction log to check at the end of the transaction to verify that the state that was read is consistent?
    The way I understand the proposed alternative here is that you DO copy all the objects up front to get isolation (this copy has to happen under a lock, btw, which introduces additional overhead), and then at the end you copy them back (again, under a lock), and any transaction that happens to overlap with the potential set of touched variables need to be aborted (to ensure consistency, as they may end up relying on the values written, we just don't know).
    I don't see how you can make it work any other way. If you allow other transactions to survive, you need a transaction log at the end so you can see which variables have changed by other transactions since you started yours (and thereby catch collisions).


    It just seems to me that this this proposed mechanism just introduces additional overhead, and inhibits parallelism for no good reason. Where are the benefits compared to regular STM?

  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    Well, staceyw said you copy it before it's used, for one.

    Copying before doesn't mean doing redundant copies or copying all at once.

    sylvan wrote:
    
    Or if you do check for actual overlap before rolling back, then you lose isolation since a potentially overlapping transaction can survive the commit (you can read variable X before a transaction commits a change to X and Y, and then you read Y afterwards. But now the X and Y that you've read aren't consistent anymore, since X is from before the last change, and Y is from after), how would you solve that without a transaction log to check at the end of the transaction to verify that the state that was read is consistent?

    That's pretty easy. You can check at compile time what the read/write state is on various variables and associate them with the instruction pointer in the function, and whenever a transaction is committed elsewhere you check that bitwise and of the variables being written don't coincide with the variables having been read, using the following code:

    extern int numThreads;
    extern struct thread** threads;


    void doRollbacks(struct transaction* t, struct transaction** ts, int len){
      // this must be atomic!
      pause_all_threads();
      exclusive_lock_begin();
     
      int i;
      for(i=0;i<len;i++){
        if( *t->write_vars[get_ip(t)] & *(ts[i]->read
    [get_ip(ts[i])]) || *t->read[get_ip(t)] & *(ts[i]->write[get_ip(ts[i])]))
          rollback(ts[i]);
      }

      exclusive_lock_end();
      unpause_all_threads();
    }

    // on each thread:
    void rollback(struct transaction *t){
      int i;
      for(i=0;i<t->write_cache_len;i++)
        // roll back all writes
        memcpy(t->write_cache[i]->ptr, t->write_cache[i]->buffer, t->write_cache[i]->len);

      // roll back the stack:
      int x = t->stackptr;
      int v = t->transaction_start;
      __asm {
        add ebp %x   ; rollback the eval stack
        jmp %v  ; rollback the function pointer.
      }
    }

    void get_ip(struct transaction *t){
      int r = 0;
      __asm {
        mov eax ebp
        mov eax [eax]
        add eax 4
        mov eax [eax]
        mov %r eax
      }
      return r;
    }

    void exclusive_lock_begin(){
      __asm {
        sti
        pushf eflags
      }
    }
    void exclusive_lock_end(){
      __asm {
        popf eflags
      }
    }

    void pause_all_threads(){
      int i;
      for(i=0; i< numThreads; i++)
        if(thread[i] != get_current_thread())
          pause(thread[i]);
    }

    void resume_all_threads(){
      int i;
      for(i=0; i< numThreads; i++)
        pause(thread[i]);
    }

  • evildictaitor wrote:
    
    sylvan wrote:
    
    Well, staceyw said you copy it before it's used, for one.

    Copying before doesn't mean doing redundant copies or copying all at once.


    Well the wording was "up front", I believe, which I would take to mean "copy everything at once before the transaction starts".

    evildictaitor wrote:
    
    sylvan wrote:
    
    Or if you do check for actual overlap before rolling back, then you lose isolation since a potentially overlapping transaction can survive the commit (you can read variable X before a transaction commits a change to X and Y, and then you read Y afterwards. But now the X and Y that you've read aren't consistent anymore, since X is from before the last change, and Y is from after), how would you solve that without a transaction log to check at the end of the transaction to verify that the state that was read is consistent?

    That's pretty easy. You can check at compile time what the read/write state is on various variables and associate them with the instruction pointer in the function


    Can you though?
    What if I read a variable, or maybe it's an input parameter to the function, and in an if-statement checking this value for 0 or whatever, I may read/update a bunch of transactional variables, then after that I go on to do something else. How would you know, just from the instruction pointer, which variables have been read if the IP points to some code after the if statement? How could you easily know what path through the preceeding code you've taken? Wouldn't that have to be a very conservative estimate (again, needlessly inhibiting parallelism)?

    In fact, if my IP is right at the final "return" statement of a transaction, then wouldn't this bit field be the same as my "potential set" of touched variables, even though the actual set of touched variables could be empty?

    If there's one thing I think we should avoid like the plague, it's introducing anything which may reduce the degree of parallelism we can see in a program. Almdahl's law scares me. We may not have hit it yet, but it's coming at us at 300mph and it's like a big brick wall on the horizon.

    Without actually logging what gets done, I don't see how we could easily check wether the transaction actually conflicts. Though I would be interested in seeing an approach to solving this issue (a transaction seeing inconsistent state) in a regular STM system using a simliar approach (i.e. right after a commit, check any potentially overlapping transaction's current log, and if the read set in those overlaps the write set in the transaction you're commiting, you can restart it). I think the implementation in GHC just relies on the commit of the incosistent transaction to catch this itself, where the odd cases of the inconsistency leading to non-termination are handled by checking it on each GC or something like that. I suspect the reasons for doing it this way are performance related (because it certainly seems obvious that a committing transaction would kill any other transactions that it has invalidated, so I can't see that they've all just missed it), which tells me that perhaps this may have terrible performance characteristics (e.g. due to that big global locking you're doing everytime something commits - ouch!), as well as being too conservative.

  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    Can you though?
    What if I read a variable, or maybe it's an input parameter to the function, and in an if-statement checking this value for 0 or whatever, I may read/update a bunch of transactional variables, then after that I go on to do something else. How would you know, just from the instruction pointer, which variables have been read if the IP points to some code after the if statement? How could you easily know what path through the preceeding code you've taken? Wouldn't that have to be a very conservative estimate (again, needlessly inhibiting parallelism)?

    The short answer is yes. Particularly in managed languages, but also in C++ and C to a lesser extent this is not only feasible but currently done - on x64 architectures C++ programs compiled with CRT checking implement try...catch...finally blocks as exactly this so that when an exception is thrown it can "rollback" the function pointer to the appropriate stage and it uses a simmilar lookup table to determine which objects on the stack were created and thus need to be finalized.

    sylvan wrote:
    
    In fact, if my IP is right at the final "return" statement of a transaction, then wouldn't this bit field be the same as my "potential set" of touched variables, even though the actual set of touched variables could be empty?

    You seriously underestimate the power of modern compiler theory. You might only have one return statement, but the compiler will have a lot. You need to do truly appalling things to C or C++ before these kind of optimisations become concerned (I mean breaking into asm or jump to void pointer kind of nasty code).

    sylvan wrote:
    
    If there's one thing I think we should avoid like the plague, it's introducing anything which may reduce the degree of parallelism we can see in a program. Almdahl's law scares me. We may not have hit it yet, but it's coming at us at 300mph and it's like a big brick wall on the horizon.

    I agree. Hopefully such advancements will allow us to get greater parallelisation with less programmer manual intervention.

    sylvan wrote:
    
    Without actually logging what gets done, I don't see how we could easily check wether the transaction actually conflicts. Though I would be interested in seeing an approach to solving this issue (a transaction seeing inconsistent state) in a regular STM system using a simliar approach (i.e. right after a commit, check any potentially overlapping transaction's current log, and if the read set in those overlaps the write set in the transaction you're commiting, you can restart it).

    We do simmilar things to databases - why should imperative code be any different? The only big concern is when rolling back very large or very old transactions, and even these are at a cost that is feasible.

    sylvan wrote:
    
    ... which tells me that perhaps this may have terrible performance characteristics (e.g. due to that big global locking you're doing everytime something commits - ouch!), as well as being too conservative.

    Determining whether or not a transaction collision occurs by definition needs a global lock on the transactions and a pause on the threads. Any other course of action could end up with a race-condition on the rollback. As you can see the time to commit is linear time complexity with respect to the number of threads and objects, with the cost of a rollback being linear time complexity (each) with respect to the the number of mutated objects.

    Consequently the average case is a global lock over a function of complexity O(n log n)
  • evildictaitor wrote:
    
    sylvan wrote:
    
    Can you though?
    What if I read a variable, or maybe it's an input parameter to the function, and in an if-statement checking this value for 0 or whatever, I may read/update a bunch of transactional variables, then after that I go on to do something else. How would you know, just from the instruction pointer, which variables have been read if the IP points to some code after the if statement? How could you easily know what path through the preceeding code you've taken? Wouldn't that have to be a very conservative estimate (again, needlessly inhibiting parallelism)?

    The short answer is yes. Particularly in managed languages, but also in C++ and C to a lesser extent this is not only feasible but currently done - on x64 architectures C++ programs compiled with CRT checking implement try...catch...finally blocks as exactly this so that when an exception is thrown it can "rollback" the function pointer to the appropriate stage and it uses a simmilar lookup table to determine which objects on the stack were created and thus need to be finalized.


    How would this work exactly?
    Are you saying that each "path" through the code gets transformed into a "tree" like shape by duplicating any code that happens after a branch so that each option gets it's own copy of the following statements?
    e.g.

    if (x>0)
    {
        foo();
    }
    bar();

    turns into:

    if (x>0)
    {
        foo();
        bar();
    }
    else
    {
        bar();
    }

    I could see how doing something like that could indeed give you a way of checking the instruction pointer for the exact set of objects used, but wouldn't code bloat be quite horrific?

    evildictaitor wrote:
    
    Determining whether or not a transaction collision occurs by definition needs a global lock on the transactions and a pause on the threads.



    Couldn't you just (write-)lock the data you've read/written when trying to commit (in some global ordering)? That way you wouldn't lock *all* transactions, only the ones who try to do anything to the data you've needed.
    You still need to "fix" any transactions that get into an infinite loop or something due to an inconsistent view of the world, but you could do  a check on each thread switch, or GC or something, and of course in their own commit (if they get that far).
  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    I could see how doing something like that could indeed give you a way of checking the instruction pointer for the exact set of objects used, but wouldn't code bloat be quite horrific?

    You certainly end up with bigger emitted code if you do it that way, but it does make transactions and exceptions easier to cope with. It's worth pointing out that this does not make the program slower, merely bigger.

    sylvan wrote:
    
    Couldn't you just (write-)lock the data you've read/written when trying to commit (in some global ordering)? That way you wouldn't lock *all* transactions, only the ones who try to do anything to the data you've needed.

    That would be a better system, I agree. I was trying to avoid using locks for the sake of simplicity.

    sylvan wrote:
    
    You still need to "fix" any transactions that get into an infinite loop or something due to an inconsistent view of the world, but you could do  a check on each thread switch, or GC or something, and of course in their own commit (if they get that far).


    Detecting whether a program is in an infinite loop is (in general) impossible. Happily however, a transaction in an infinite loop will never commit (by definition), and therefore will keep running "out-of-the-way" or will be eventually reset by another transaction commit that overwrites data that the infinitely-looping transaction has read from or written to.

    You suggest however that the infintely-running transaction might commit, however this would signal the immediate end of the transaction - a transaction can be thought of as a boolean method on it's own thread (or a turing machine), and it commits, it returns true and terminates, and if it aborts, it returns false and terminates.


  • sylvan wrote:
    
    Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.

    I agree that message passing is ideal when suitable, but sometimes it just isn't. Also, while threads are sometimes excellent abstractions (even for things where you *don't* care about concurrency they may be the right model), sometimes they just suck. They basically suffer from the same problem that "goto" does: obfuscation of the structure of the program (you have to jump back and forth through messages in different threads, which one may not be known until you run the program, to understand what the program does).

    Now I freely confess that I hadn't seen E (I'll look into it now) so if they give some new cool abstraction that solves all of this I'll recant my statements, but for now I'll say this: there is no one solution, we need multiple solutions to the problem of parallelism/concurrency. In my book the bara minimum is: Nested data parallelism, task-based (purely functional) parallelism, threads with messages, and threads with shared state (here's where you need transactions).


    In message oriented systems, erlang, scala, etc, the isolated, message passing units are more lightweight than threads. The idea is to get away from the traditional thread with its heavy stack. Message passing is a robust, proven way of isolating state, that can scale well. Most people believe message passing is easier than locks, which do not scale well. Message passing in conjunction with functional programming (erlang) has proven successful for difficult concurrent telecommunications applications.

    Transactions will always be needed for some types of applications, but transactional memory is a new thing, that to me, seems more of a hack to keep holding onto older ways of working.
  • staceyw wrote:
    
    sylvan wrote:
    
    Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.
    ...


    I see what your saying, but lets analyze that.  Essentially, that is a classic ReaderWriter lock sample.  You need the sync, because you never know who the last action was, a reader or writer.  So you still need some kind of sync primitive to protect the invariants.

    One may say, lets keep the message queue for writers, but let readers read properties directly. But here we have couple issues.  You still need lock to insure atomic reads (i.e non cache and non torn).  Second, you have possible coordination issues the runtime could never know and only your logic knows.  For example, your object may intentionally not be popping the queue (sort of a working blocking operation) queue until it completes work for last task - as maybe order is important or maybe is waiting on various replies.  There is all kinds of reasons order could be important and you can't reliably short circut that in general case.  So only the Object knows best.  As far as Perf goes, yes the message passing has some cost.  But so do ReaderWriters and Monitors.  When you look at the "net" performance, I think it probably is a wash or better with message passing.  Costs such as complexity and correctness proof are much less from the start.  There is many other benefits you can't get with locks or even trans memory.  You can get order symatics, self messaging, throttling, naturally async, pipelining, natural composition, loose binding, and others.

    Think Juval Lowy nailed it here. Every class needs to be a service:
    http://channel9.msdn.com/ShowPost.aspx?PostID=349561#349561


    If you think about "processes" that are lighter than a thread, and do not need a stack, messages can possibly be as fast as ordinary method calls, with the added difficulty of enregistration of parameters. A stack is used to save the function call frames. A message queue saves message parameters. In terms of storage, there is not a big difference, it is primarily a question of creating a very fast queue and dispatching mechanism. 

    I think it is best to assume the creators of erlang, scala, E, etc have or will address performance, and avoid assumptions about implementation or performance.

    Just looking at the advantages, I agree with you, it is a great way to work.
  • evildictaitor wrote:
    

    Detecting whether a program is in an infinite loop is (in general) impossible.



    Well you only need to detect problems that has resulted from the program reading inconsistent state. If it diverges for any other reason it's someone else's problem!
    With a log-based transactional system this is trivial, just check that the read values in the log match the actual values. If not, then the actual values must have been updated by another transaction, and any transaction that has read from it is invalid. Normally the transaction could detect this inconsistency when it validates before it commits, but if the inconsistency happesn to cause it to diverge then you need to check it somewhere else too (e.g. on thread switching) as it will never commit.

    This way you never have to look at any other transactions when committing, you just make sure that you are consistent and then commit. And in most cases a slight inconsistency won't lead to a transaction diverging, so that transaction can check itself when it reaches its commit. And if it does diverge it should be rare, so we can just validate the "read" entries in the log periodically.


    Frank Hileman wrote:
    In message oriented systems, erlang, scala, etc, the isolated, message passing units are more lightweight than threads. The idea is to get away from the traditional thread with its heavy stack. Message passing is a robust, proven way of isolating state, that can scale well. Most people believe message passing is easier than locks, which do not scale well. Message passing in conjunction with functional programming (erlang) has proven successful for difficult concurrent telecommunications applications.

    Transactions will always be needed for some types of applications, but transactional memory is a new thing, that to me, seems more of a hack to keep holding onto older ways of working.


    The performance issues with threads isn't just that threads have overhead themselves, it's that many applications simply don't map well to the concept of "threads and messages". This leads to massive contention. I gave several examples in my previous posts, like for example a big game world that all your objects want to update (each of them needs atomic updates). With message passing these accesses will be serialised, as only one object a time can access the world atomically (unless you break the world up into smaller pieces with one "service" each, which means you're effectively simulating locks using threads, so you get all the usual locking issues - deadlock, race conditions etc.). In situations such as this, when there is high contention for a resource (even if it's "large")where atomic access is needed (even the accesses are usually independent), threads simply doesn't give you any parallelism.

    Threads are great when you problem happens to map nicely to small indpendent chunks that can communicate via messages. Many problems simply cannot be written in this way without horrible contention.
    So for a general purpose language we can't rely entirely on threads and messages. We need good support for threads and message passing too, but we can't afford to just leave things running sequentially because our programming model doesn't support parallelizing it (almdahl's law tells us that if we do, then this bit of sequential code will become our bottleneck in no time).


  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    Well you only need to detect problems that has resulted from the program reading inconsistent state. If it diverges for any other reason it's someone else's problem!


    The point of transactions is that you never have inconsistent state. Whenever a transaction does a write, it clones the object and writes to the clone, and only when it commits does it write to the shared memory. You can then cancel any transactions which might have read or written to these values (and would have inconsistent state if they continued), and simply free their memory and restart them. This has the nicety that you never need to check for your own consistency because it is guarranteed when you start, while you are running, and that any possibly overwrite of data that might render your world view inconsistent will result in your own immediate termination and rescheduling. You would however need to use an appropriate weighting function to avoid livelock.
  • evildictaitor wrote:
    
    The point of transactions is that you never have inconsistent state. Whenever a transaction does a write, it clones the object and writes to the clone, and only when it commits does it write to the shared memory.  You can then cancel any transactions which might have read or written to these values (and would have inconsistent state if they continued), and simply free their memory and restart them.


    That's one implementation strategy, yes, I was talking about was the implementation strategy used in e.g. Haskell STM (which is the other way around) where you have a transaction log that you fill in as you go along, and then in the commit you simply check your read values for consistency with actual "live" values, and if they haven't changed you're good to write out your changes.
    This can lead to inconsistencies. So the key is to kill off these invalid transactions somehow. You could do it by having each transaction kill off any conflicting transactions when it commits (your approach), but in GHC what they do (as far as I can tell) is that they let those transactions detect that they're inconsistent when they commit. That leaves the corner case of a transaction that diverges (and thus would never commit) due to an inconsistency. These should be rare, and can be checked as a special case periodically (e.g. on GC or thread switching).
    The benefits of this approach, as far as I can tell, is that each transaction only ever looks at its own stuff, and won't have to go look through a bunch of other transactions just to verify that none of them conflict (which they usually won't have). Either approach would work, which is why I said earlier that I would be interested in seeing the other way tried in e.g. Haskell to see if this approach would have benefits (e.g. because invalid transactions would be terminated immediately).

    (there are variations on this idea, e.g. lookup Tim Harris' research you'll see an approach where the writes in the transactions actually write directly to the "live" objects - optimizing for successful commits, and the transation logs are only used for rolling back if the commit fails)

    However, the suggested strategy of specifying transactional variables up front would have problems with scenarios where the transaction variables aren't known at compile time.
    E.g. if you read transactional variables from a transactional channel, and do transactional updates on them. Since those transactional variables can come from any other thread that has access to the channel (and depend on user input or whatever), and you don't even know how many of them you'll get, there's no way of producing a static lookup of which variables have been read based on the instruction pointer. As far as I can see you either need a transaction log, or you disallow transactional variables as first class citizens. So I'm not sure how well the log-less idea would work in practice (it seems to me that storing transactional variables inside other transactional variables would be very common, e.g. when you send a message to another thread and also supply a message channel for that thread to send its reply).
  • duplicate
  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    there are variations on this idea, e.g. lookup Tim Harris' research you'll see an approach where the writes in the transactions actually write directly to the "live" objects - optimizing for successful commits, and the transation logs are only used for rolling back if the commit fails

    It would surprise me if this sped things up, since most objects are accessed via pointer (and certainly are in .NET) and thus whether you copy first and mutate on the new object or make a backup so you can rollback is merely a question of whether you swap the pointers at the point of a transactional write. But maybe there's some other benefit to this, I don't know.

    sylvan wrote:
    
    However, the suggested strategy of specifying transactional variables up front would have problems with scenarios where the transaction variables aren't known at compile time.

    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    sylvan wrote:
    
    E.g. if you read transactional variables from a transactional channel, and do transactional updates on them. Since those transactional variables can come from any other thread that has access to the channel (and depend on user input or whatever), and you don't even know how many of them you'll get, there's no way of producing a static lookup of which variables have been read based on the instruction pointer.

    One of the points of transactions is they are (effectively) syncronous. That is to say that if a transactional message is passed to a function, either that function must run syncronously and return, or it must guarrantee that the source thread relinquishes control of the object and that the result of the computation is non-side-effecting.

    sylvan wrote:
    
    As far as I can see you either need a transaction log, or you disallow transactional variables as first class citizens. So I'm not sure how well the log-less idea would work in practice (it seems to me that storing transactional variables inside other transactional variables would be very common, e.g. when you send a message to another thread and also supply a message channel for that thread to send its reply).

    Again, I can't think of a good reason to spawn multiple threads within a transaction. If you start to get to that level of complexity, the effect of a rollback or commit would be effectively to cancel a large number of transactions and do many copy-backs at the point of commit (or abort), which strikes me as needing an alternative model than transactions for efficient use.
  • William Staceystaceyw Before C# there was darkness...
    evildictaitor wrote:
    
    sylvan wrote:
    
    Well you only need to detect problems that has resulted from the program reading inconsistent state. If it diverges for any other reason it's someone else's problem!


    The point of transactions is that you never have inconsistent state. Whenever a transaction does a write, it clones the object and writes to the clone, and only when it commits does it write to the shared memory. You can then cancel any transactions which might have read or written to these values (and would have inconsistent state if they continued), and simply free their memory and restart them. This has the nicety that you never need to check for your own consistency because it is guarranteed when you start, while you are running, and that any possibly overwrite of data that might render your world view inconsistent will result in your own immediate termination and rescheduling. You would however need to use an appropriate weighting function to avoid livelock.


    I agree.  This also is relatively easy to reason about, which is important as code gets complex.
  • sylvan wrote:
    
    EDIT: Oh, and consider the problem of granularity. Take the example of a game. You're running a thread for an AI character who wants to perform an action on the game world atomically. How do you solve that? Do you ask the World object to retrieve whatever it is the AI wants to access? In other words is the World object's service going to act as basically having one big lock on it for any atomic updates that AI characters want to do? Or do you put the implicit "lock" somewhere further down in the hierarchy (e.g. on individual objects in the world)? If so, how is this different from just having locks on the individual objects? How do you solve the problem of not knowing up front which objects you want to modify (because the set of objects you need depends on the values you find in the first couple of objects you examine)? Aren't we back in the old hornets nest of locks and condition variables again?
    So basically we either have to let the toplevel object provide a transactional interface (amounting to a big implicit lock), and basically eliminate any chances of parallelism, or we go back to horribly impractical fine grained locking with all the problems that poses. In this scenario, message passing gives no improvement to us, whereas transactional memory "just works" without any synchronisation burden placed on the programmer at all!


    Ignoring the "thread" implementation idea (since the isolated units are less than threads), and other implementation assumptions, I assume you are stating the problems as: world, one big unit containing smaller pieces of mutating data, and ai characters, many smaller units that interact with this. Let's call the units processes (not process in the OS sense). What interaction do you see between these processes?

    I am not sure how you would define the interaction, but intuitively, I think you selected an example where message passing works well (and is probably why massive multiplayer games use messaging). Lets suppose each ai process receives  message G to retrieve data from the world, and sends message S to mutate the world.

    Each message is queued. Assume the world has a set of messages queued, of type S. Assume it is sending out G in between S processing work. 

    AI1 requested data and got a message G. It sends out S based on the values in G, lets call it world state 1. In the mean time the world has changed. By the time it processes that S, it is in world state 2.
     
    1) How can an ai reliably compute the data for S if the world is constantly changing state?

    2) How much parallelism have we lost by putting so much mutable data in the large world process?

    The answer to question 1 is usually in the design of the algorithms. Ideally the difference between world state 1 and 2 does not affect the validity of the ai message S to the world. That is, it may be something like, add this amount of energy to a particle, not, set the absolute value of the energy of this particle. A delta, for a game, is probably more appropriate.

    If the validity of the message is dependent upon the world state, message S could include as data a "world state stamp", that is an identifier showing that S is only valid if the world is still in state 1. Message G, from the world to the state would include this stamp.

    The world process would then ignore S messages with past due stamps, resending G to the sender ai processes with ignored S messages, so they could recompute.

    This would be a contention problem with one giant world process. Which leads to the second question, granularity. If the world is broken up into many smaller processes, the contention could potentially go away, assuming every ai is not working with a world state dependent algorithm, with each trying to modify the same small mutable state with that same algorithm.

    Messaging does not make contention problems go away, it is true. You must still design the system correctly to avoid that. 

    I would argue messaging makes contention problems easier to identify and analyze, because now you deal only with the contention problem itself, instead of synthetic problems introduced with traditional muti-threading and locks.
  • sylvan wrote:
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.


    Again, assuming you do not mean "threads" but some lightweight process concept, this is a problem for the message dispatcher, not the programmer. The dispatcher can recognize a sequence of many messages retrieving data, and they can all be run in parallel. There is no need to serialize the message processing until state is mutated. That is why messaging and functional programming work so well together. Any locking or true OS threads are hidden to the programmer.
  • Anyone interested in comparing message passing concurrency to other techniques may wish to read this:

    http://www.info.ucl.ac.be/~pvr/bookcc.html

    And in particular this paper:

    http://www.info.ucl.ac.be/~pvr/flopsPVRarticle.pdf
  • evildictaitor wrote:
    
    sylvan wrote:
    
    there are variations on this idea, e.g. lookup Tim Harris' research you'll see an approach where the writes in the transactions actually write directly to the "live" objects - optimizing for successful commits, and the transation logs are only used for rolling back if the commit fails

    It would surprise me if this sped things up, since most objects are accessed via pointer (and certainly are in .NET) and thus whether you copy first and mutate on the new object or make a backup so you can rollback is merely a question of whether you swap the pointers at the point of a transactional write. But maybe there's some other benefit to this, I don't know.


    It does indeed seem to speed things up. He gets only about 50% overhead (compared to no transactions) for short lived transactions, which is much better than locks in the benchmarks in the paper and very impressive. There are other optimizations too, though.

    evildictaitor wrote:
    
    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    Why could they not be on the heap?
    Yould read a TVar containing a filter function, and another TVar containing a list of TVars, and then filter this list based on the filter function to get a reduced list of TVars, and then modify each of them based on the value of yet another TVar, etc. etc. How could you possibly know which values have been written to when the IP points "past" this code? The number of TVars that you have modified is not statically known, the filter function is not statically known, and the list of TVars itself is not statically known.

    evildictaitor wrote:
    
    Again, I can't think of a good reason to spawn multiple threads within a transaction.


    I haven't said anything about spawning multiple threads within a transaction?



    Frank Hileman wrote:

    Again, assuming you do not mean "threads" but some lightweight process concept, this is a problem for the message dispatcher, not the programmer. The dispatcher can recognize a sequence of many messages retrieving data, and they can all be run in parallel. There is no need to serialize the message processing until state is mutated. That is why messaging and functional programming work so well together. Any locking or true OS threads are hidden to the programmer.


    How could these message be run in parallel, if each of these message requires atomic updates? I.e I need to do "Find position of explosive barrel, if I collide with it then explod it", it's no good if someon else does this at the same time and moves the barrel after I've read the position but before I explode it! How could you possibly know that two threads who both read data from the game world (for example) will not decide to write to the same place as a result of that read? Atomic access is key, and with message passing that means the accesses get serialised.
  • sylvan wrote:
    How could these message be run in parallel, if each of these message requires atomic updates? I.e I need to do "Find position of explosive barrel, if I collide with it then explod it", it's no good if someon else does this at the same time and moves the barrel after I've read the position but before I explode it! How could you possibly know that two threads who both read data from the game world (for example) will not decide to write to the same place as a result of that read? Atomic access is key, and with message passing that means the accesses get serialised.


    If you have decided by design that all messages to the barrel modify its state, and that all messages are dependent upon the state of the barrel (ie are invalid if the barrel has been modified), you have serialized access to the barrel by design. It does not matter what form of concurrency you use, locks, message passing, transactions, it is the same problem, and is the same problem a CPU has when determining the dispatch order of instructions that write to a memory location.

    The description I gave previously of the ai and world processes applies to your barrel scenario. Ideally the game is designed so that serialization is not needed by design -- ie the barrel explodes regardless of whether it has moved. If you decide to create a choke point in your design, and all processes are hitting that spot at the same time, you have designed something that does not parallelize well, regardless of the concurrency mechanism.

    Massively multiplayer games use messaging extensively, and probably avoid that type of "serialization by design".
  • Frank Hileman wrote:
    

    If you have decided by design that all messages to the barrel modify its state, and that all messages are dependent upon the state of the barrel (ie are invalid if the barrel has been modified), you have serialized access to the barrel by design. It does not matter what form of concurrency you use, locks, message passing, transactions, it is the same problem, and is the same problem a CPU has when determining the dispatch order of instructions that write to a memory location.


    No, it's not serialized by design, in fact it's (deliberately) extremely parallel by design, with the occasional rare conflict. You have tens of thousands of objects, most of which don't care one bit about that barrel, but sometimes one of them does, and even more rarely two or more of them do.
    The point is that the mere infinitismal possibility of conflicts cause 100% serialization when you use message passing, whereas with transactions you can run in parallel, and deal with those rare cases of conflicts when and only when they actually occur.

    If it's just the case of a single barrel you may be able to hack your own optimistic transactional memory on top of the messages (e.g. you have one message which does not block that you can use to check if you need to update the barrel, and if so you just do it again with the atomic version - that way 99.9% of the objects would just decide that they don't care about the barrel at all and leave it alone), but it gets much worse in real world scenarios. In practice you'll often have each object want read N unspecified objects from the world, and modify M other unspecified objects in the world (which may or may not overlap with the N that you read). There is no way to know up front which objects you need to read/modify, you only know the exact set of objects that was needed after the operation has occured. All this has to happen atomically, naturally, which means that with message passing you'll be forced to have a single service guarding "The World", and each object's operations on the world will be entirely serialized. It's simply impossible to do this concurrently if your world is guarded by a message process, even though the number of actual conflicts that these atomic operations have are very very low.
    And again, with transactional memory, the problem simply disappears and you get near linear speedup as you add more CPU:s.

    Also, I didn't "design" the problem to be difficult for message passing, it just was difficult for message passing all by itself. Sometimes the thing you're simulating just isn't suitable to message passing. You can't blame the problem because the language doesn't offer a good way of solving it!

    Look, I'm the biggest FP advocate there is. I like Erlang et al. as much as the next guy (though my favourite language is Haskell), but the fact of the matter is that there are real problems that can not be solved with message passing. In my experience, most applications that are actually concurrent by nature (servers, etc.) can use message passing good effect, but when you try to speed up non-concurrent applications the instances where your problems map nicely to threads and messages start to become more rare. We can't just ignore these problems (again, Almdahl's law won't let us), we need to provide a solution for them too. That's why we need many ways of doing these things. In most cases you can be data parallel, in some cases you need task based parallelism, and in yet fewer cases you can use threads and message passing, and in fewer cases still you need transactional memory. We can't leave any of these out though, as that would disqualify the language from being considered "general purpose" w.r.t. concurrency/parallelism, IMO.
  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    evildictaitor wrote:
    
    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    Why could they not be on the heap?

    Because you need to keep a pointer to that object on the heap in order to use it. Or put another way, an object on the heap that is not referenced inside the transitive closure of the variables on the stack is elegiable for disposal.


    sylvan wrote:
    
    Yould read a TVar containing a filter function, and another TVar containing a list of TVars, and then filter this list based on the filter function to get a reduced list of TVars, and then modify each of them based on the value of yet another TVar, etc. etc. How could you possibly know which values have been written to when the IP points "past" this code? The number of TVars that you have modified is not statically known, the filter function is not statically known, and the list of TVars itself is not statically known.

    No, but you know that the list of TVars was read, and you have the TVars site referenced locally in the function locals list. You can see then that the TVars have been changed, and that when you want to do a mutate on the TVars object you will need to make a copy of the array (which is an array of pointers and thus not expensive) for either backup or for preemptive mutation.

    As to whether or not the filter function is known, it is clear that the filter function is known statically, since in managed languages you must define all of the code that may run over the course of the program statically prior the program execution.

    sylvan wrote:
    
    evildictaitor wrote:
    
    Again, I can't think of a good reason to spawn multiple threads within a transaction.


    I haven't said anything about spawning multiple threads within a transaction?

    In which case your mechanism of passing messages must either be considered to be a side-effect (which is another problem, but one that can also be solved) or just a linear sequential operation, which is no different to any other transaction based function call.
  • evildictaitor wrote:
    
    sylvan wrote:
    
    evildictaitor wrote:
    
    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    Why could they not be on the heap?

    Because you need to keep a pointer to that object on the heap in order to use it. Or put another way, an object on the heap that is not referenced inside the transitive closure of the variables on the stack is elegiable for disposal.


    Yes, of course, but at what point should we stop saying that the set of transactional variables that has been read/written is "known at compile time"?
    If you actually have to trace down arbitrarily deep into a data structure in order to find the TVar that was actually modified, then I would say that the set of TVars can be figured out dynamically, but it's not something that can be figured out statically at compile time anymore (which was my whole point).

    So to support first class transactional variable you cannot say "my program counter is X, therefore here's my set of touched TVars" in O(1) anymore. And if the set of TVars depend on e.g. a filter function applied to a list, then you need to run the filter function again to find out which TVars where actually written to (and imagine that the function is very expensive...), or of course assume that all of them were (which would cause needless inhibition of parallelism).

    So I think my point remains: The proposed alternative implementation would have difficulties with this, whereas just keeping a transaction log would handle this a lot easier (as you just store which variables you've accessed when you access them).
  • evildictaitorevildictait​or Devil's advocate
    sylvan wrote:
    
    Yes, of course, but at what point should we stop saying that the set of transactional variables that has been read/written is "known at compile time"?
    If you actually have to trace down arbitrarily deep into a data structure in order to find the TVar that was actually modified, then I would say that the set of TVars can be figured out dynamically, but it's not something that can be figured out statically at compile time anymore (which was my whole point).


    No, you only look at variable sites, you don't look at items on the heap.

    While you may think that you are trying to make your point, I do have to point out again that this technology is not new or theoretical. It is currently being used in various architectures and in simmilar mechanisms, such as the CRT exception handler model for x64 processors.
  • sylvan wrote:
    
    Frank Hileman wrote:
    

    If you have decided by design that all messages to the barrel modify its state, and that all messages are dependent upon the state of the barrel (ie are invalid if the barrel has been modified), you have serialized access to the barrel by design. It does not matter what form of concurrency you use, locks, message passing, transactions, it is the same problem, and is the same problem a CPU has when determining the dispatch order of instructions that write to a memory location.


    No, it's not serialized by design, in fact it's (deliberately) extremely parallel by design, with the occasional rare conflict. You have tens of thousands of objects, most of which don't care one bit about that barrel, but sometimes one of them does, and even more rarely two or more of them do.
    The point is that the mere infinitismal possibility of conflicts cause 100% serialization when you use message passing, whereas with transactions you can run in parallel, and deal with those rare cases of conflicts when and only when they actually occur.

    If it's just the case of a single barrel you may be able to hack your own optimistic transactional memory on top of the messages (e.g. you have one message which does not block that you can use to check if you need to update the barrel, and if so you just do it again with the atomic version - that way 99.9% of the objects would just decide that they don't care about the barrel at all and leave it alone), but it gets much worse in real world scenarios. In practice you'll often have each object want read N unspecified objects from the world, and modify M other unspecified objects in the world (which may or may not overlap with the N that you read). There is no way to know up front which objects you need to read/modify, you only know the exact set of objects that was needed after the operation has occured. All this has to happen atomically, naturally, which means that with message passing you'll be forced to have a single service guarding "The World", and each object's operations on the world will be entirely serialized. It's simply impossible to do this concurrently if your world is guarded by a message process, even though the number of actual conflicts that these atomic operations have are very very low.
    And again, with transactional memory, the problem simply disappears and you get near linear speedup as you add more CPU:s.

    Also, I didn't "design" the problem to be difficult for message passing, it just was difficult for message passing all by itself. Sometimes the thing you're simulating just isn't suitable to message passing. You can't blame the problem because the language doesn't offer a good way of solving it!

    Look, I'm the biggest FP advocate there is. I like Erlang et al. as much as the next guy (though my favourite language is Haskell), but the fact of the matter is that there are real problems that can not be solved with message passing. In my experience, most applications that are actually concurrent by nature (servers, etc.) can use message passing good effect, but when you try to speed up non-concurrent applications the instances where your problems map nicely to threads and messages start to become more rare. We can't just ignore these problems (again, Almdahl's law won't let us), we need to provide a solution for them too. That's why we need many ways of doing these things. In most cases you can be data parallel, in some cases you need task based parallelism, and in yet fewer cases you can use threads and message passing, and in fewer cases still you need transactional memory. We can't leave any of these out though, as that would disqualify the language from being considered "general purpose" w.r.t. concurrency/parallelism, IMO.


    Games work exceptionally well with message passing. As I discussed regarding your previous ai scenario, if the message to the barrel (change state) includes a "barrel state stamp" then the barrel knows it can change state, assuming that stamp matches the current barrel state. If this is hard to envision, imagine the barrel increments a private counter every time it changes important state (important to the message sender). That counter is the state stamp.

    When the barrel receives your state changing message, it can process it as long as there is not other similar messages competing. If the state stamp has changed it must tell the sender that message was discarded, as it is no longer valid. Then the sender can recompute or abandon.

    This is essentially what happens with transactions as well. The messages as I describe are a type of optimistic transaction. Scalability in games is probably acheived by minimizing choke points, regardless of the concurrency mechanmism used.

    There is no more serialization with message passing than with transactions. If you have many processes modifying the same mutable state in your barrel, and all these modifications are dependent on the state of the barrel (ie invalid if state has changed) you have a contention problem that is not solved by any concurrency mechanism. Messaging does not make this worse. If most processes are not modifying your barrel state, they are not barrel state dependent, and there is no serialization problem as you describe. Then both messaging and transations work well.

    Your second argument, regarding N reads and M writes, you claim is solved better by a transaction. All you are doing is breaking up the granularity. You can do the exact same thing with messages. Instead treating the whole world as one process, break up the writes into messages to M processes. If all must be done atomically, then you do need a transactional system built using messages. Ultimately such a transactional system must commit writes.

    One way you might do it is a two stage commit. First the supervisor (modifying) process sends a message to each M process acquiring a lock. Assuming a message is sent back with succeed or fail, the next step is to send a message to each M process to actually mutate data. During that time each M process cannot be modified by anything else (ie is temporarily owned by the supervisor process). After mutation is complete, the lock is freed. This only requires two messages to each M process and one message back from each M process. This is a fine grained form of locking and does not block any other processes from modifying any other mutable data in the meantime. Nor do the locks prevent reading messages from being processed. Only a writing or lock acquiring message would fail, and only to those specific pieces of data.

    The point is you can do anything you wish with message passing. It is a fundamental building block, and can scale as well as your design permits. If your design has no need for atomic composite commits, you can do that. If you do need atomic composite commits, you can do that as well.
  • evildictaitor wrote:
    
    sylvan wrote:
    
    Yes, of course, but at what point should we stop saying that the set of transactional variables that has been read/written is "known at compile time"?
    If you actually have to trace down arbitrarily deep into a data structure in order to find the TVar that was actually modified, then I would say that the set of TVars can be figured out dynamically, but it's not something that can be figured out statically at compile time anymore (which was my whole point).


    No, you only look at variable sites, you don't look at items on the heap.

    While you may think that you are trying to make your point, I do have to point out again that this technology is not new or theoretical. It is currently being used in various architectures and in simmilar mechanisms, such as the CRT exception handler model for x64 processors.


    Well you would have to look at variables on the heap, if that's where the transactional variables are.
    Again, I'm not saying that it wouldn't work, just that it wouldn't work very well. Using it for exception handling is obviously quite different (you don't frequently need a list of all variables touched, all you need is a list of the root pointers for the current stack frame so you can call destructors, right?).

    The fact remains, that with your approach you have an unbounded amount of work to do to find which transactional variables have been used, which is my point. Using a log you already have them available immediately.
  • Frank Hileman wrote:
    
    Games work exceptionally well with message passing.


    Actually no they don't. I do this for a living, which is precisely why I used them as an example.

    Frank Hileman wrote:
    


    Your second argument, regarding N reads and M writes, you claim is solved better by a transaction. All you are doing is breaking up the granularity. You can do the exact same thing with messages. Instead treating the whole world as one process, break up the writes into messages to M processes.


    But the N and M will be different for each object. On thread might grab all the barrels in the game an do something to them, another might grab all the players and barrels in the games, etc. etc. So just statically breaking the world up into M processes won't work well.

    Essentially what you have is a big set of tens of thousands of objects, each of these must be able to atomically modifiy small sets of the others. There is no simple way of splitting this up into sub-processes (and even if you do you would need to "lock" each sub-process you're interested in, which scales poorly). Simple locking (even if implemented with threads) leads to deadlock etc, here. With a simple message passing system you will end up serialising all of these updates.

    Frank Hileman wrote:
    
    The point is you can do anything you wish with message passing. It is a fundamental building block, and can scale as well as your design permits. If your design has no need for atomic composite commits, you can do that. If you do need atomic composite commits, you can do that as well.


    Well obviously you can use threads to simulate locks, and then use that to build an STM library, but that wasn't really the point. The point was that message passing itself isn't really suitable for all problems. If you just end up building ad hoc (inefficient) transactional systems on top of the message passing, then obviously you would be better off to have an efficient system instead (however cheap the threads are, they're not cheap enough to simulate a lock efficiently).
  • There is no way to tell if an efficient message passing system would be as fast or faster than transactional memory for your scenario, without building and trying it out. But I suspect there is some bias against the idea of lightweight processes and efficient message passing, because we see so few common implementations with that level of efficiency.

    If you read some of the links I pointed out earlier, they explain how message passing is a lower level building block than transactional memory. Being lower level, it can be faster as well, when you do not need full transactions.

    I have nothing against transactional memory except that it helps preserve existing serial ways of thinking. Share nothing, message passing concurrency, seems to balance and scale almost automatically.

  • Message passing in games:

    http://softwaremaven.innerbrane.com/2007/12/python-versus-erlang-for-mmog.html

    That is only distributed server stuff. It seems message passing would work well on clients as well, given that the game might be modeled as many independent actors interacting with one another.

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.