sylvan

sylvan sylvan

Comments

  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    I agree that Haskell isn't a panacea. I never said it was. The main thing I would want from it is purity with some way of abstracting over effects (and ideally being able to write your own for EDSLs). This is precisely why I think it would be useful for someone with deep pockets to take a stab at creating a new purely functional programming language knowing what we now know. Haskell is about 20 years old, so it certainly has some warts accumulated like any language of that age - I think a benevolent dictator is needed to take the main lessons from Haskell and lift them over onto a clean slate - possibly with provisions to make it less scary for newbies (e.g. C style syntax).

    Haskell 98 is fairly outdated and almost never what anyone means when they say Haskell (the new standard is underway). At minimum you need to included Concurrent Haskell to get what peopel these days are using, but looking at things like STM and NDP you really see just how far ahead it is. 
    So taking that into account, I'm not sure I understand your issues with "one top-level I/O loop"? What's wrong with having N top-level IO-loops (forkIO) communicating with messages? Then each of those could have lots of fine grained task-based concurrency (using Control.Parallel.Strategies) and even nested data parallelism for even more fine grained parallelism. This pure code could even use (provably) local mutable state using the ST monad. I don't see how Erlang or Occam offer anything that you can't do in Haskell (though you may want to provide a constrained monad that only provides certain IO-operations, like forking, but this is trivial to do, to be more Erlang-like).

    EDIT: In fact, this is one area where I think Haskell really shines. The composition of fine grained and course grained parallelism. You have threads and message passing (or shared state with STM) for coarse grained. Then you have a bunch of pure code executed on top of that (with local mutable state). This pure code can be parallelised in a task-based way using Control.Parallel.Strategies (e.g. PLINQ-style, or Cilk-style, but safe because it's all pure). Then for incredibly fine grained parallelism, there's now work going on with nested data parallelism. This give massive scalability (GPU-like) while still being flexible enough (Nested! The compiler does the magic of flattening it for you) to offer a decent programming model. And then in the future, when Haskell takes over the world, you would have graph-reduction hardware (see Reduceron) and just run Haskell on that for massive instruction-level parallelism. So Haskell (or H#) could truly offer scalable parallelism at every granularity.
    So I agree that getting parallelism at all levels is crucial, but I don't see how Haskell fails in any way in this area, in fact I think it excels.
  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    If you're going to wait until a miracle language appears you'll have a long wait ahead of you. Meanwhile we have languages that are sufficiently more sophisticated (than C# et al) in this area that they almost seem miraculous if you squint your eyes. Why waste that massive benefit because we haven't yet found a magical cure for all problems?

    It's not about there being a silver bullet, it's about recognizing fairly uncontroversial facts, such as "ad hoc side effects have no business in a manycore world", and making sure our languages don't violate that. 
    You don't have to have the perfect answer to every single detail, but it's a good start to get the basic substrate right - and I think it's a pretty safe bet that whatever other technology emerges, side effects will need to be tamed, so let's start there.
  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    That's still just a marketing issue. If your boss doesn't want you to learn "H#", then clearly more marketing is needed until he sends you off on a Microsoft H# seminar.

    Yes newbies are free to use Haskell, but as I tried to explain earlier switching to purely functional programming is a fairly big shift, and I think the chances of it happening are much greater if a big company like MS pushes it. If the shift doesn't happen, then the costs involved will be very high - for everyone, not just Microsoft.

    I do use Haskell over C#, and I do think that Haskell is a much more practical and pragmatic language for concurrency than anything .Net has to offer, including F# which is still not pure (again, you can't be 90% virgin, you either are or you aren't). I just think widespread adoption (which will benefit us all) of this has to be pushed by a big company, and MS could be it. If not, well I guess I'll have to accept that .Net isn't especially suitable for concurrency and avoid it for those scenarios. Unfortunately, those scenarios will increase in frequency in the future.
  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    I still think you're presenting a false dichotomy. It's not about "beautiful" vs "getting the job done". It's about "the best way to get the job done in 5-10 years when we'll have hundreds of cores". The only thing stopping the "average joe programmer" from spending the effort to learn something radically different is marketing.

    You keep implying that purity is somehow at odds with pragmaticism and getting the job done and there's just no evidence of that. It's at odds with legacy code, but that's it. MS successfully got people to use .Net rather than their legacy systems, so it's clearly possible to get over that hump. Yes, it'll be easier to sell an extension to C#, but at what point do you step back and say "hang on, C# is now more complicated than Haskell for a newbie, maybe we should have a language that does this from the ground up rather than tack things on to an existing substrate that doesn't quite fit"? 
    C# with purity annotations is definitely better than nothing at all (because as I said, we need this to be pushed by a big entity), but having the option of another language (C# would still exist, clearly!) that's pure from the start would be even better.
  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    But by "practical" you're just talking about marketing, not any technical issues. The issue isn't about using the actual language, that's clearly doable an perfectly practical, the issue is convincing people that they should make the investment to learn something new. That might be difficult, and costly. I guess my point is that making something that's only "90% pure" (which is really a nonsensical concept, like "90% virgin", Smiley) may be less costly up front (because you can extend C# into something that's more messy and complicated in an absolute sense, but offers a smaller relative learning curve because people already know what we have now), but those "10% problems" will add up over the coming decades to dwarf the one-time cost of a paradigm shift.

    So I think it's a false dichotomy that we're choosing between "perfect but unusable" and "imperfect but useful".

    A small company wouldn't have an option, they would have to go for an evolutionary approach, but a company like Microsoft does have the option of whole-sale paradigm shift. They'd have seminars, and books, and visual studio integration, and maybe even get hardware partners to produce Reduceron-style co-processors for graph-reduction. MS can throw their weight and marketing behind it to make it happen, but very few other companies could.
  • Expert to Expert - Joe Duffy: Perspectives on Concurrent Programming and Parallelism

    One of the best C9 episodes yet, IMO. Get two people in a room and let them discuss stuff without fear of getting too technical.

    I'm with Eric on this one personally. My main beef with Duffy's side of the debate (which is essentially just saying "Yes, we agree in principle, but we don't want to force everyone to learn something completely new so we're going to try to tack it on to what we already have") is the following:

    1. I think the cost of not having a "robust" solution to this issue is underestimated. IMO the cost of forcing every man, woman and child with a compiler to learn a Haskell-like purely functional language is peanuts compared to the cost of letting them loose in a manycore world without the adequate tools. The perceived cost may be different, but that's what marketing is for.

    2. The complexity of adding pure "bubbles" to an impure language quickly mounts to the point where the overall system is FAR more complicated than Haskell is. Think of all the different "kinds" of effects annotations you would need for a pure subset (transitive const, immutable, locally mutable but not externally visible, etc.). There are many ways of being impure, but only one way of being pure. Pure default makes life easier for both compiler writers and users.

    3. It's tempting to be "nice" to existing .Net developers by not forcing them to learn something new, but I wonder if you're really helping them in the long run.  Think about Hoare's nullable pointers - it was added on a whim because it was easy, and now he refers to it as his Billion Dollar Mistake. Sometimes doing the hard thing is what will end up making life easier for people. Related to 1. above, but the main point is "don't be nice for the sake of being nice".

    4. Very few entities could pull off kick starting a paradigm shift. If not microsoft, then who? Nobody is my guess.


    That said, the disagreement is far smaller than it may appear. Pretty much everyone agrees that in a perfect world we'd all just switch to a Haskell-like language, the question is if the real-world cost of doing that outweighs the long term cost of not doing it. Some say no (me included), some say yes.

    As a suggestion for future interviews, I'd recommend heading back to the UK and checking up with SPJ et al, they're working on some really cool parallelism stuff in Haskell (specifically nested data parallelism).

    Oh, and about SPJs graph that you had on C9 a while back, I don't actually think that he said Haskell was useless, he said Haskell started out being useless (circa 1990) and then progressed to become more and more useful (while remaining safe), whereas other language started out being useful but unsafe, and are now adding features for becoming more safe. So the idea is that both prongs of attacks are nearing the same Nirvana, but with different strategies.
  • C# 4.0: Meet the Design Team

    Charles, when you're interviewing them next time, it would be great if you could delve into the "intersection" between meta programming (which they said they were working on) and dynamic compilation of code (which they also talked about).
    It sounds like there's an exciting opportunity here to combine those two and get run-time specialization of code.

    E.g. if you have a regexp matcher that takes a regular expression and a string and returns true if it matches, you could specialize the first parameter of that function for a specific regular expression (which isn't known at compile time, only at runtime) and then run that specialized version on ten million strings. The benefit is that the regular expression would get hard coded into the specialized version so that the matching doesn't have to "interpret" it each time. There's a cost involved in doing thie specialization, but quite often you'll use it so often that the benefit outweighs that cost.

    Seems to me like compilation can be viewed as basically the act of "baking in" known facts about your code (e.g. the size of a variable) into machine code so you can execute it faster. Well I'd say that many applications will have "known facts" that may not be true for the entire duration of the program, but will at least be true for "long enough" that it's worth baking them in anyway. Another example is an image filter, where you take the filter kernel and compile it down into machine code and then run it over the image (because the kernel won't change for the duration of the filter operation). A third example is a game where you load a game level, that level won't change for several minutes at least so it would be cool if you could "bake it in" and do all sorts of optimizations under the assumption that the game level is "constant" for some duration of time (e.g. inline calls that are virtual, but since the data is constant you know exactly where you need to go, unroll loops, recursion, do standard constant folding etc.).

    Anyway, this is my pet idea for a "killer app" where you'll see all C++ advocates gladly switch to C# for the increased performance, and if you have both meta programming *and* runtime code generation/loading you're basically almost there.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    Frank Hileman wrote:
    
    Games work exceptionally well with message passing.


    Actually no they don't. I do this for a living, which is precisely why I used them as an example.

    Frank Hileman wrote:
    


    Your second argument, regarding N reads and M writes, you claim is solved better by a transaction. All you are doing is breaking up the granularity. You can do the exact same thing with messages. Instead treating the whole world as one process, break up the writes into messages to M processes.


    But the N and M will be different for each object. On thread might grab all the barrels in the game an do something to them, another might grab all the players and barrels in the games, etc. etc. So just statically breaking the world up into M processes won't work well.

    Essentially what you have is a big set of tens of thousands of objects, each of these must be able to atomically modifiy small sets of the others. There is no simple way of splitting this up into sub-processes (and even if you do you would need to "lock" each sub-process you're interested in, which scales poorly). Simple locking (even if implemented with threads) leads to deadlock etc, here. With a simple message passing system you will end up serialising all of these updates.

    Frank Hileman wrote:
    
    The point is you can do anything you wish with message passing. It is a fundamental building block, and can scale as well as your design permits. If your design has no need for atomic composite commits, you can do that. If you do need atomic composite commits, you can do that as well.


    Well obviously you can use threads to simulate locks, and then use that to build an STM library, but that wasn't really the point. The point was that message passing itself isn't really suitable for all problems. If you just end up building ad hoc (inefficient) transactional systems on top of the message passing, then obviously you would be better off to have an efficient system instead (however cheap the threads are, they're not cheap enough to simulate a lock efficiently).
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    evildictaitor wrote:
    
    sylvan wrote:
    
    Yes, of course, but at what point should we stop saying that the set of transactional variables that has been read/written is "known at compile time"?
    If you actually have to trace down arbitrarily deep into a data structure in order to find the TVar that was actually modified, then I would say that the set of TVars can be figured out dynamically, but it's not something that can be figured out statically at compile time anymore (which was my whole point).


    No, you only look at variable sites, you don't look at items on the heap.

    While you may think that you are trying to make your point, I do have to point out again that this technology is not new or theoretical. It is currently being used in various architectures and in simmilar mechanisms, such as the CRT exception handler model for x64 processors.


    Well you would have to look at variables on the heap, if that's where the transactional variables are.
    Again, I'm not saying that it wouldn't work, just that it wouldn't work very well. Using it for exception handling is obviously quite different (you don't frequently need a list of all variables touched, all you need is a list of the root pointers for the current stack frame so you can call destructors, right?).

    The fact remains, that with your approach you have an unbounded amount of work to do to find which transactional variables have been used, which is my point. Using a log you already have them available immediately.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    evildictaitor wrote:
    
    sylvan wrote:
    
    evildictaitor wrote:
    
    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    Why could they not be on the heap?

    Because you need to keep a pointer to that object on the heap in order to use it. Or put another way, an object on the heap that is not referenced inside the transitive closure of the variables on the stack is elegiable for disposal.


    Yes, of course, but at what point should we stop saying that the set of transactional variables that has been read/written is "known at compile time"?
    If you actually have to trace down arbitrarily deep into a data structure in order to find the TVar that was actually modified, then I would say that the set of TVars can be figured out dynamically, but it's not something that can be figured out statically at compile time anymore (which was my whole point).

    So to support first class transactional variable you cannot say "my program counter is X, therefore here's my set of touched TVars" in O(1) anymore. And if the set of TVars depend on e.g. a filter function applied to a list, then you need to run the filter function again to find out which TVars where actually written to (and imagine that the function is very expensive...), or of course assume that all of them were (which would cause needless inhibition of parallelism).

    So I think my point remains: The proposed alternative implementation would have difficulties with this, whereas just keeping a transaction log would handle this a lot easier (as you just store which variables you've accessed when you access them).
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    Frank Hileman wrote:
    

    If you have decided by design that all messages to the barrel modify its state, and that all messages are dependent upon the state of the barrel (ie are invalid if the barrel has been modified), you have serialized access to the barrel by design. It does not matter what form of concurrency you use, locks, message passing, transactions, it is the same problem, and is the same problem a CPU has when determining the dispatch order of instructions that write to a memory location.


    No, it's not serialized by design, in fact it's (deliberately) extremely parallel by design, with the occasional rare conflict. You have tens of thousands of objects, most of which don't care one bit about that barrel, but sometimes one of them does, and even more rarely two or more of them do.
    The point is that the mere infinitismal possibility of conflicts cause 100% serialization when you use message passing, whereas with transactions you can run in parallel, and deal with those rare cases of conflicts when and only when they actually occur.

    If it's just the case of a single barrel you may be able to hack your own optimistic transactional memory on top of the messages (e.g. you have one message which does not block that you can use to check if you need to update the barrel, and if so you just do it again with the atomic version - that way 99.9% of the objects would just decide that they don't care about the barrel at all and leave it alone), but it gets much worse in real world scenarios. In practice you'll often have each object want read N unspecified objects from the world, and modify M other unspecified objects in the world (which may or may not overlap with the N that you read). There is no way to know up front which objects you need to read/modify, you only know the exact set of objects that was needed after the operation has occured. All this has to happen atomically, naturally, which means that with message passing you'll be forced to have a single service guarding "The World", and each object's operations on the world will be entirely serialized. It's simply impossible to do this concurrently if your world is guarded by a message process, even though the number of actual conflicts that these atomic operations have are very very low.
    And again, with transactional memory, the problem simply disappears and you get near linear speedup as you add more CPU:s.

    Also, I didn't "design" the problem to be difficult for message passing, it just was difficult for message passing all by itself. Sometimes the thing you're simulating just isn't suitable to message passing. You can't blame the problem because the language doesn't offer a good way of solving it!

    Look, I'm the biggest FP advocate there is. I like Erlang et al. as much as the next guy (though my favourite language is Haskell), but the fact of the matter is that there are real problems that can not be solved with message passing. In my experience, most applications that are actually concurrent by nature (servers, etc.) can use message passing good effect, but when you try to speed up non-concurrent applications the instances where your problems map nicely to threads and messages start to become more rare. We can't just ignore these problems (again, Almdahl's law won't let us), we need to provide a solution for them too. That's why we need many ways of doing these things. In most cases you can be data parallel, in some cases you need task based parallelism, and in yet fewer cases you can use threads and message passing, and in fewer cases still you need transactional memory. We can't leave any of these out though, as that would disqualify the language from being considered "general purpose" w.r.t. concurrency/parallelism, IMO.
  • Burton Smith: On General Purpose Super Computing and the History and Future of Parallelism

    evildictaitor wrote:
    
    sylvan wrote:
    
    there are variations on this idea, e.g. lookup Tim Harris' research you'll see an approach where the writes in the transactions actually write directly to the "live" objects - optimizing for successful commits, and the transation logs are only used for rolling back if the commit fails

    It would surprise me if this sped things up, since most objects are accessed via pointer (and certainly are in .NET) and thus whether you copy first and mutate on the new object or make a backup so you can rollback is merely a question of whether you swap the pointers at the point of a transactional write. But maybe there's some other benefit to this, I don't know.


    It does indeed seem to speed things up. He gets only about 50% overhead (compared to no transactions) for short lived transactions, which is much better than locks in the benchmarks in the paper and very impressive. There are other optimizations too, though.

    evildictaitor wrote:
    
    I fail to see why. All of the variable sites are known at compile time (by definition they are on the stack).

    Why could they not be on the heap?
    Yould read a TVar containing a filter function, and another TVar containing a list of TVars, and then filter this list based on the filter function to get a reduced list of TVars, and then modify each of them based on the value of yet another TVar, etc. etc. How could you possibly know which values have been written to when the IP points "past" this code? The number of TVars that you have modified is not statically known, the filter function is not statically known, and the list of TVars itself is not statically known.

    evildictaitor wrote:
    
    Again, I can't think of a good reason to spawn multiple threads within a transaction.


    I haven't said anything about spawning multiple threads within a transaction?



    Frank Hileman wrote:

    Again, assuming you do not mean "threads" but some lightweight process concept, this is a problem for the message dispatcher, not the programmer. The dispatcher can recognize a sequence of many messages retrieving data, and they can all be run in parallel. There is no need to serialize the message processing until state is mutated. That is why messaging and functional programming work so well together. Any locking or true OS threads are hidden to the programmer.


    How could these message be run in parallel, if each of these message requires atomic updates? I.e I need to do "Find position of explosive barrel, if I collide with it then explod it", it's no good if someon else does this at the same time and moves the barrel after I've read the position but before I explode it! How could you possibly know that two threads who both read data from the game world (for example) will not decide to write to the same place as a result of that read? Atomic access is key, and with message passing that means the accesses get serialised.