Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Comments

Frank Hileman Frank Hileman VG.net
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    sylvan wrote:
    How could these message be run in parallel, if each of these message requires atomic updates? I.e I need to do "Find position of explosive barrel, if I collide with it then explod it", it's no good if someon else does this at the same time and moves the barrel after I've read the position but before I explode it! How could you possibly know that two threads who both read data from the game world (for example) will not decide to write to the same place as a result of that read? Atomic access is key, and with message passing that means the accesses get serialised.


    If you have decided by design that all messages to the barrel modify its state, and that all messages are dependent upon the state of the barrel (ie are invalid if the barrel has been modified), you have serialized access to the barrel by design. It does not matter what form of concurrency you use, locks, message passing, transactions, it is the same problem, and is the same problem a CPU has when determining the dispatch order of instructions that write to a memory location.

    The description I gave previously of the ai and world processes applies to your barrel scenario. Ideally the game is designed so that serialization is not needed by design -- ie the barrel explodes regardless of whether it has moved. If you decide to create a choke point in your design, and all processes are hitting that spot at the same time, you have designed something that does not parallelize well, regardless of the concurrency mechanism.

    Massively multiplayer games use messaging extensively, and probably avoid that type of "serialization by design".
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    Anyone interested in comparing message passing concurrency to other techniques may wish to read this:

    http://www.info.ucl.ac.be/~pvr/bookcc.html

    And in particular this paper:

    http://www.info.ucl.ac.be/~pvr/flopsPVRarticle.pdf
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    sylvan wrote:
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.


    Again, assuming you do not mean "threads" but some lightweight process concept, this is a problem for the message dispatcher, not the programmer. The dispatcher can recognize a sequence of many messages retrieving data, and they can all be run in parallel. There is no need to serialize the message processing until state is mutated. That is why messaging and functional programming work so well together. Any locking or true OS threads are hidden to the programmer.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    sylvan wrote:
    
    EDIT: Oh, and consider the problem of granularity. Take the example of a game. You're running a thread for an AI character who wants to perform an action on the game world atomically. How do you solve that? Do you ask the World object to retrieve whatever it is the AI wants to access? In other words is the World object's service going to act as basically having one big lock on it for any atomic updates that AI characters want to do? Or do you put the implicit "lock" somewhere further down in the hierarchy (e.g. on individual objects in the world)? If so, how is this different from just having locks on the individual objects? How do you solve the problem of not knowing up front which objects you want to modify (because the set of objects you need depends on the values you find in the first couple of objects you examine)? Aren't we back in the old hornets nest of locks and condition variables again?
    So basically we either have to let the toplevel object provide a transactional interface (amounting to a big implicit lock), and basically eliminate any chances of parallelism, or we go back to horribly impractical fine grained locking with all the problems that poses. In this scenario, message passing gives no improvement to us, whereas transactional memory "just works" without any synchronisation burden placed on the programmer at all!


    Ignoring the "thread" implementation idea (since the isolated units are less than threads), and other implementation assumptions, I assume you are stating the problems as: world, one big unit containing smaller pieces of mutating data, and ai characters, many smaller units that interact with this. Let's call the units processes (not process in the OS sense). What interaction do you see between these processes?

    I am not sure how you would define the interaction, but intuitively, I think you selected an example where message passing works well (and is probably why massive multiplayer games use messaging). Lets suppose each ai process receives  message G to retrieve data from the world, and sends message S to mutate the world.

    Each message is queued. Assume the world has a set of messages queued, of type S. Assume it is sending out G in between S processing work. 

    AI1 requested data and got a message G. It sends out S based on the values in G, lets call it world state 1. In the mean time the world has changed. By the time it processes that S, it is in world state 2.
     
    1) How can an ai reliably compute the data for S if the world is constantly changing state?

    2) How much parallelism have we lost by putting so much mutable data in the large world process?

    The answer to question 1 is usually in the design of the algorithms. Ideally the difference between world state 1 and 2 does not affect the validity of the ai message S to the world. That is, it may be something like, add this amount of energy to a particle, not, set the absolute value of the energy of this particle. A delta, for a game, is probably more appropriate.

    If the validity of the message is dependent upon the world state, message S could include as data a "world state stamp", that is an identifier showing that S is only valid if the world is still in state 1. Message G, from the world to the state would include this stamp.

    The world process would then ignore S messages with past due stamps, resending G to the sender ai processes with ignored S messages, so they could recompute.

    This would be a contention problem with one giant world process. Which leads to the second question, granularity. If the world is broken up into many smaller processes, the contention could potentially go away, assuming every ai is not working with a world state dependent algorithm, with each trying to modify the same small mutable state with that same algorithm.

    Messaging does not make contention problems go away, it is true. You must still design the system correctly to avoid that. 

    I would argue messaging makes contention problems easier to identify and analyze, because now you deal only with the contention problem itself, instead of synthetic problems introduced with traditional muti-threading and locks.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    staceyw wrote:
    
    sylvan wrote:
    
    Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.
    ...


    I see what your saying, but lets analyze that.  Essentially, that is a classic ReaderWriter lock sample.  You need the sync, because you never know who the last action was, a reader or writer.  So you still need some kind of sync primitive to protect the invariants.

    One may say, lets keep the message queue for writers, but let readers read properties directly. But here we have couple issues.  You still need lock to insure atomic reads (i.e non cache and non torn).  Second, you have possible coordination issues the runtime could never know and only your logic knows.  For example, your object may intentionally not be popping the queue (sort of a working blocking operation) queue until it completes work for last task - as maybe order is important or maybe is waiting on various replies.  There is all kinds of reasons order could be important and you can't reliably short circut that in general case.  So only the Object knows best.  As far as Perf goes, yes the message passing has some cost.  But so do ReaderWriters and Monitors.  When you look at the "net" performance, I think it probably is a wash or better with message passing.  Costs such as complexity and correctness proof are much less from the start.  There is many other benefits you can't get with locks or even trans memory.  You can get order symatics, self messaging, throttling, naturally async, pipelining, natural composition, loose binding, and others.

    Think Juval Lowy nailed it here. Every class needs to be a service:
    http://channel9.msdn.com/ShowPost.aspx?PostID=349561#349561


    If you think about "processes" that are lighter than a thread, and do not need a stack, messages can possibly be as fast as ordinary method calls, with the added difficulty of enregistration of parameters. A stack is used to save the function call frames. A message queue saves message parameters. In terms of storage, there is not a big difference, it is primarily a question of creating a very fast queue and dispatching mechanism. 

    I think it is best to assume the creators of erlang, scala, E, etc have or will address performance, and avoid assumptions about implementation or performance.

    Just looking at the advantages, I agree with you, it is a great way to work.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    sylvan wrote:
    
    Frank Hileman wrote:

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.


    Why not have both?
    In some cases message passing just doesn't work well at all. For example you may want 10000 objects to be able to query the same data (but only one or two of those objects of those modifies the data). Do you really want each to pass through a protocol with just a single thread "owning" that data? Sounds like a recipe for disaster w.r.t. performance to me. Transactions scale very well in situations like that (which is extremely common in practice), because each thread can do all the reading it wants to without interfering with the execution of any other threads.

    I agree that message passing is ideal when suitable, but sometimes it just isn't. Also, while threads are sometimes excellent abstractions (even for things where you *don't* care about concurrency they may be the right model), sometimes they just suck. They basically suffer from the same problem that "goto" does: obfuscation of the structure of the program (you have to jump back and forth through messages in different threads, which one may not be known until you run the program, to understand what the program does).

    Now I freely confess that I hadn't seen E (I'll look into it now) so if they give some new cool abstraction that solves all of this I'll recant my statements, but for now I'll say this: there is no one solution, we need multiple solutions to the problem of parallelism/concurrency. In my book the bara minimum is: Nested data parallelism, task-based (purely functional) parallelism, threads with messages, and threads with shared state (here's where you need transactions).


    In message oriented systems, erlang, scala, etc, the isolated, message passing units are more lightweight than threads. The idea is to get away from the traditional thread with its heavy stack. Message passing is a robust, proven way of isolating state, that can scale well. Most people believe message passing is easier than locks, which do not scale well. Message passing in conjunction with functional programming (erlang) has proven successful for difficult concurrent telecommunications applications.

    Transactions will always be needed for some types of applications, but transactional memory is a new thing, that to me, seems more of a hack to keep holding onto older ways of working.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    Hi Charles,

    Thanks to you and Burton Smith for the conversation. I would like to hear more about anything at microsoft having to do with isolation via message passing and capability security, similar to the E language:

    http://www.erights.org/

    I know we have CCR, but the syntax is clumsy, messaging does not have first class status, nor is there any isolation. What kind of similar things to E are happening at Microsoft, if any?

    For me, isolation via message passing sounds more interesting than transactional memory. It has proven useful for decades.

    Thanks,
    Frank
  • Erik Meijer, Gilad Bracha, Mads Torgersen: Perspectives on Programming Language Design and Evolution

    evildictaitor wrote:
    
    Frank Hileman wrote:
    The Singualrity OS proved that inter-process communication, at least, can be faster when using a higher level language, as long as strict contracts are observed. This means drivers are best not written in a language such as c++, since they are a critical part of the OS stack.


    Depends. If your operating system trusts the people who are writing the drivers (as has tended to be the case in the past) then languages such as C++ and C will almost invariably be at least as fast as anything written in C# or other high-level pointerless languages.


    Correct. But OS hangups and crashes do not promote trust among end users. The interesting thing about higher level languages and formal methods is the promise of static analysis to both improve reliablity and increase performance. Right now the emphasis is on safety, not performance, but that could change.
  • Erik Meijer, Gilad Bracha, Mads Torgersen: Perspectives on Programming Language Design and Evolution

    aL_ wrote:
    (i also think that while the clr isnt perfect, its unfair to say that the its "severly limited")


    From the perspective of someone implementing a language on a virtual machine, the CLR, with its baked in notions of types and polymorphism, is more limited than a simpler VM adding only things like garbage collection and object encapsulation, leaving dispatch and other details to the language implementer. From that perspective the CLR is limited. MSIL is essentially C# in a lower level form, not a general purpose VM.

    Severely limited from the language implementer point of view, hence the DLR and other VMs since the CLR and JVM.
  • Erik Meijer, Gilad Bracha, Mads Torgersen: Perspectives on Programming Language Design and Evolution

    The Singualrity OS proved that inter-process communication, at least, can be faster when using a higher level language, as long as strict contracts are observed. This means drivers are best not written in a language such as c++, since they are a critical part of the OS stack.