Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Programming in the Age of Concurrency: Software Transactional Memory

Download

Right click “Save as…”

Recently, we visited MSR Cambridge(UK) to meet some of the great minds working there. In this case, we were fortunate enough to get an hour's time with Simon Peyton-Jones and Tim Harris, who are researchers working on a very hard problem: making it easier (more predictable, more reliable, more composable) to write concurrent applications in this the age of Concurrency (multi-core is a reality, not a dream).

Specifically, Simon and Tim (and team) are working on a programming technology called Software Transactional Memory (STM) which provides an elegant, easy to use language-level abstraction for writing concurrent applications that is based on widely-understood conceptual constructs like Atomic operations (and, well, Transactions...). Simon, Tim and team do all the nasty locking work for you. With STM-enabled languages, you can just concentrate on the algorithms at hand and leave the low-level heavy lifting to the sub-system. Sound familiar?

So, imagine this:

atomic
{
   //do stuff - if failure, then throw ex out 
   //of block, roll back - this is a transaction...
}

/*next code fragment... So, code flow appears sequential to the programmer(as we would expect), even though under the covers it is of course not always processing sequentially*/

Read scientific papers here.

Play with STM here and here.




[My apologies for the somewhat shaky camera work. This conversation took place shortly after the terrorist scare at London's Heathrow airport (I had to leave some of my camera equipment in New Delhi)]

Tags:

Follow the Discussion

  • William Staceystaceyw Before C# there was darkness...
    Good stuff.  Couple questions:

    1) I missed a fundamental idea I think.  It sound as if the last writer wins.  So if I have an atomic block, I can make changes to state.  But another thread can write to the state and commit, but the first atomic block will then rollback.  What did I miss?

    2) Charles, you had a good question that don't think was fully answered in the vid.  Doing async in the block.

    atomic()
    {
          // Do some stuff
          BeginXXX(
               delegate(int i)
               {
                    // Do some stuff in callback.
               }
    }

    We do some stuff and then kick off an async operation.  But then we exit the atomic block right away because the async operation does not block.  So the callback is not executed in atomic anymore.  And worse yet,  the code appears as if it is still executed in the block.  What I am missing?  Or was the answer just to never do this?

    3)  How does long running I/O and DB transactions play with this.  So in a normal app, I would need both to be atomic.  I need the in-memory objects to be atomic and the DB to be atomic in a single transaction.  How does this work?

    Thanks!
  • CharlesCharles Welcome Change

    Niners, please play with this (the C#/VB.net library may be easier to grasp given that Haskell is purely functional...). Tim and Simon need real world feedback. How would you use atomic blocks in your applications? What scenarios make use of atomic hard for you? See the links in the video post description. Also, please read the Composability paper (you don't have to be a scientist to understand it...)

    Play with STM!

    C

  • Another great one Charles. This is some pretty complex stuff to wrap your head around, and the bit about the garbage collector and all that's required to implement atomic transactions really went way over my head!

    But I tell you, I wish I had this technology in the languages I use (C++ and Pascal) RIGHT NOW!

    And Haskell is really getting some air time these days Wink

    I love C9 videos!
  • "1) I missed a fundamental idea I think.  It sound as if the last writer wins.  So if I have an atomic block, I can make changes to state.  But another thread can write to the state and commit, but the first atomic block will then rollback.  What did I miss?"

    The first one, like all blocks, will check for changes before commit and recalculate if they exist, i think.
  • staceyw wrote:
    1) I missed a fundamental idea I think.  It sound as if the last writer wins.  So if I have an atomic block, I can make changes to state.  But another thread can write to the state and commit, but the first atomic block will then rollback.  What did I miss?

    Your understanding sounds on par with mine, except that I don't see why the thread that commits its data wins and the other looses (except for the performance penalty).

    If thread A tries to withdraw $100 from an account but there's no money in it, it retries which in effect puts it to sleep.

    Then thread B comes along and credits $100 to the same account. Since no other threads are accessing that account, thread B can commit and continue its processing.

    Then thread A wakes up since it detects a write to the account's memory and then retries the whole transaction which succeeds since the money is there.

    Of course the thread that got into the retry state hits an additional performance penalty, but I think the whole idea of this system is to make things easier, not faster.

    That sounds extremely powerful.

    When you said the last writer wins, were you talking about performance?


    staceyw wrote:
    3)  How does long running I/O and DB transactions play with this.  So in a normal app, I would need both to be atomic.  I need the in-memory objects to be atomic and the DB to be atomic in a single transaction.  How does this work?

    Yeah, there's still a lot of things that are still unknown.

    In the debit/credit example, what if the money never gets deposited in the account? Does an atomic transaction has a timeout value? What if an atomic block triggers an external process of some kind by setting a property of an object?

    I suppose there's a whole set of guidelines and restrictions we'd have to follow when using atomic blocks. Reading the papers available would surely answer most of the unknowns.

    Eric
  • John Melville-- MDJohn Melville-- MD Equality Through Technology

    It seems to me that  STM creates  new problems with composability.  You create two classes of code: atomic methods and non atomic methods.

    Nonatomic methods can easily call atomic ones – the compiler could even automatically inject the atomic block if the programmer forgot.

    Atomic methods and blocks cannot be allowed to call nonatomic code.  The nonatomic code could do I/O or other irrevocable things that would be duplicated when the block had to retry.

    This creates the exact problem that the “const” modifier has in C++.  Your library writer has to anticipate everything you may want to do inside a transaction and provide you with an atomic version.  By liberally making methods atomic, the author seriously limits future versions of the library because making a previously atomic method non-atomic is a breaking change.

    In C++ they cop out and let you cast away the “constness” of a variable because there were too many places where the composition, exactly the problem you are trying to fix, didn’t work when you had one class of code that couldn’t call another class of code.

    Its worse for STM because you can’t even cop out and say “use this sparingly.”  Letting a transaction call nonatomic code creates the very race condition that you are trying to avoid, and makes them worse by running code many times that you have told the programmer to think about as only running once.

    I am certain this will not be the first time this problem has come up, especially since you mention statically enforcing it in the compiler.  I am just curious to hear what people think about what it will do to composability.

     

  • Hi, great questions --

    staceyw wrote:
    1) I missed a fundamental idea I think.  It sound as if the last writer wins.  So if I have an atomic block, I can make changes to state.  But another thread can write to the state and commit, but the first atomic block will then rollback.  What did I miss?


    You're right, this is what the implementation in Haskell will do.  But there's actually a lot of flexibility for the implementation to do other things -- maybe to guarantee some kind of fairness between threads so that the atomic blocks of one thread don't keep on getting rolled back by the atomic blocks of another, or to hold off the execution of one thread's atomic block because it's really likely to conflict with another (if conflicts are really likely then its best to run the atomic blocks sequentially rather than in parallel).

    I think this is one of the nice things about atomic blocks -- the abstraction provided to the programmer is "all this happens atomically", but the implementation can do smarter things to reduce the wasted work caused by atomic blocks being rolled back. 

    If you want to read more about this aspect of transactional memory, then Michael Scott's group at U Rochester has done a lot of nice work in the area.

    staceyw wrote:
    2) Charles, you had a good question that don't think was fully answered in the vid.  Doing async in the block.

    atomic()
    {
          // Do some stuff
          BeginXXX(
               delegate(int i)
               {
                    // Do some stuff in callback.
               }
    }

    We do some stuff and then kick off an async operation.  But then we exit the atomic block right away because the async operation does not block.  So the callback is not executed in atomic anymore.  And worse yet,  the code appears as if it is still executed in the block.  What I am missing?  Or was the answer just to never do this?


    That's a difficult question!  As you say, one answer could be that this should never be done -- in Haskell we have a very clear separation between things that can be done in atomic blocks (like updates to shared memory), and things that can't (like creating new threads). 

    Another answer could be that the BeginXXX action kicks off the async operation as normal when the atomic block commits, but that the delegate runs in its own transaction when its called.  

    A third option is that the async actions are kicked off when the atomic block commits, and the delegates then just run normally.  This option fits in with one of the ways that we've sometimes explained the semantics of atomic blocks: think of "atomic" as causing the thread entering the block to suspend all of the others in the system and then run in isolation, resuming the other threads when it reaches the end of the block.

    As I said though, it's a difficult question; maybe there are better options than these three!

    staceyw wrote:
    3)  How does long running I/O and DB transactions play with this.  So in a normal app, I would need both to be atomic.  I need the in-memory objects to be atomic and the DB to be atomic in a single transaction.  How does this work?


    Another great topic Smiley.  At the moment we're looking at adding features to the Haskell implementation to do this kind of thing.  Basically it involves doing a two-phase commit that involves the transactional memory and the database -- that means getting the transactional memory and database to both vote that their part of the transaction is OK to commit, and then only doing the commit once both have voted OK.  It adds some complexity to the implementation -- after the transactional memory has voted OK then it can't go back on its decision and abort.  I did actually look at this kind of technique in an earlier system using transactional memory -- there's a bit about it in a paper "Exceptions and side effects in atomic blocks" linked from my page http://research.microsoft.com/~tharris.

    The thing that's really important to remember for atomic blocks, though, is that it must make sense for the block's contents to occur as a single atomic action.  For instance, atomic blocks that just perform input or just perform output are OK (e.g. we can buffer up the output until the block commits), but an atomic block that writes something on the screen, waits for a user's interaction, and then continues isn't OK -- that's really two separate actions (one before the user's input, and one after the user's input), rather than a single atomic action.

    Tim
  • This looks like fantastic work and a great step forward for solid concurrent programming without the the pain of deadlocks and race conditions.

    I have one question, on timing: Say we have a long, slow atomic method A, and a swift method B. The System runs hundreds of B calls sequentially while on a different thread, method A is called. Now if I understand the model correctly, if a B call commits before A is done, A is restarted with the new data. Considering that B calls are run continuously and  are always faster than A, will A ever be able to complete? Locking solves that with a ready queue, which keeps track of the next method to have exclusivity. How is that handled in STM?
  • Andrew DaveyAndrew Davey www.​aboutcode.​net

    Excellent video!
    I downloaded the C# code and took a quick look.
    Good work with the runtime proxy generation. Geez, it must have been painful writing all that emit IL code!

    I'll take a look at writing some Nemerle macros to make the syntax nice. I'm thinking macros along the lines of:

    foo = atomic(Foo(arg1, arg2))
    <==>
    foo = XAction.MakeFactory(typeof(Foo)).Create(arg1, arg2) : Foo;
    -------------------------
    atomic(foo.Bar(x,y,z))
    <==>
    XAction.Run(XStart(foo.Bar), x, y, z);
    --------------------------
    retry
    <==>
    XAction.Retry();
    --------------------------
    y = attempt { Get1, Get2 } (arg1, arg2)
    <==>
    y = XAction.Run(XAction.OrElse(new XStart(Get1), new XStart(Get2)), arg1, arg2) : int;

    y = attempt {Get1, Get2, Get3} (arg1, arg2)
    <==>
    y = XAction.Run(XAction.OrElse(XAction.OrElse(new XStart(Get1), XAction.OrElse(new XStart(Get2), new XStart(Get3)))), arg1, arg2) : int;

  • ebdrupebdrup Ebdrup


    Suppose I have a cache of some calculation, that I calculate in a lock with double checking to be thread safe, like this:


    private static readonly Dictionary<Type, int[]> _typeInts = new Dictionary<Type, int[]>();

    public int[] TypeInts
    {
        get
        {
            Type thisType = this.GetType();
            if (!_typeInts.ContainsKey(thisType))
            {
                lock (_typeInts)
                {
                    if (!_typeInts.ContainsKey(thisType))
                    {
                        _typeInts[thisType] = <<calculate int[]>>;
                    }
                }
            }
            return _typeInts[thisType];
        }
    }


    Would I be able to replace this with:


    private static readonly Dictionary<Type, int[]> _typeInts = new Dictionary<Type, int[]>();

    public int[] TypeInts
    {
        get
        {
            atomic
            {
                Type thisType = this.GetType();
                if (!_typeInts.ContainsKey(thisType))
                {
                    _typeInts[thisType] = <<calculate int[]>>;
                }
            }
            return _typeInts[thisType];
        }
    }

    my intuition says it should work if I do all looking up and updating of the _typeInts inside an atomic block.
    But suppose the ContainsKey is run and another thread interrupts before the int[] is calculated. The other thread runs the same atomic operation to completion, would the first atomic opration retry because some state it read was updated or would it throw an exception when trying to insert a duplicate key?

  • ebdrup wrote:
    my intuition says it should work if I do all looking up and updating of the _typeInts inside an atomic block.
    But suppose the ContainsKey is run and another thread interrupts before the int[] is calculated. The other thread runs the same atomic operation to completion, would the first atomic opration retry because some state it read was updated or would it throw an exception when trying to insert a duplicate key?


    Hi -- yes, if the second thread's atomic block commits while the first thread's one is still running then the runtime system will detect the conflict, roll back the first thread's attempt, and then re-execute it (when it will see the update that the second thread has made). 

    Tim

  • ebdrupebdrup Ebdrup

    I can see how STM could have some performance implications, I mean if the atomic blocks get large ther is a lot of bookkeeping about what memory has been read and written. I can also see there is a lot of room for smart soultions in reducing the amount of bookkeping nessesary. But I love the power of the concept of transactional memory.

    I'm wondering, couldn't STM have been implemented by you inserting the locks nessesary or probably better, a combination of locks where they would be smart and not deadlock and reevaluating where that would be deemed smart. I mean it's easy to think up scenarioes where there would be performance problems with reevaluating and you need a lot of good heuristics to ensure good performance in most cases. If you let the compiler insert the locks nessesary you can have it analyze for deadlocks at least in some cases and you eliminate the problem of having to keep a global order of locks that you talk about in the video.

    Why did you opt for reevaluating instead of inserting locks. (guess the answer is composability)

  • Hi,

    For starters, this is really amazingly cool. The first time I heard of this was in one of Herb Sutters presentations on concurrency. I have always hated locks, because you just can't get them right in non-trivial applications. And don't trust anyone you that says you can.

    Can we have it tomorrow?

    I do have some lowlevel questions:

    Of course it is easy to make a private copy for the current thread of all the objects participating in the transaction. But how is it prevented that you accidently see a pointer to the "public" version of one of the objects in the transaction? Potentially expensive...

    Do you aquire locks under the covers?

    Are there also people looking at hardware support? With the processors virtual memory support, and copy-on-write features at this level, you could get some of the transactional features fairly cheap. It would complicate the garbage collector implementation a little....

  • jan.devaan wrote:
    Can we have it tomorrow?


    You can have it today!

    This is already implemented in Haskell (as mentioned in the video), and in a much more elegant way than will ever be possible in an imperative language. Haskell already separates code with side effects from code without side effects because it's purely functional, so it's "trivial" to simply disallow the side effecting code within transactions (except, of course, the transactional side effects, which are kind of the point Smiley). Basically you'll layer your code into three layers, the lowest layer is the IO layer which does all the nasty side effecting things like networking, user input, etc., on top of that is the transactional layer in which you do all your reads and writes to shared memory (in atomic blocks as needed), and from both of these lower layers you'll call the topmost layer, the purely functional layer (which is where the actual application logic is written). They key is that you only allow calls "up", not "down". I.e. you can call pure functions and transactions from IO, but you can't call IO from transactions and pure functions. Likewise you can call pure functions from transactions, but you can't call transactions from pure functions.

    The transactional layer is "new" (added late 2005, IIRC), but the strategy of separating your code into layers of increasing purity (now three, previously two) is not new and has been proven to be extremly elegant and viable. In fact, even well written imperative code usually does this for design reasons - Haskell just enforces it and buys all sorts of cool things with the properties that hold because of it, such as easy parallellism ("These two functions can provably not interfere with each other due to lack of side effects you say? Well then I'll just go ahead and compute them on different threads then!"), transactions, far easier reasoning (and therefor maintenance) etc.

    Check it out! http://www.haskell.org
  • http://groups.google.com/group/comp.arch/msg/e0958ecf43f95f51

    Any thoughts?


    P.S.--

    may I suggest that you all visit comp.programming.threads once and a while… We have the goods wrt scalability and throughput. We will give you the honest appraisal, and not mislead you with TM!

    Thank you,

    Chris Thomasson
    http://appcore.home.comcast.net/
    (portable lock-free data-structures)

  • cristom wrote:
    http://groups.google.com/group/comp.arch/msg/e0958ecf43f95f51

    Any thoughts?


    P.S.--

    may I suggest that you all visit comp.programming.threads once and a while… We have the goods wrt scalability and throughput. We will give you the honest appraisal, and not mislead you with TM!

    Thank you,

    Chris Thomasson
    http://appcore.home.comcast.net/
    (portable lock-free data-structures)



    Lock-less programming is all well and good for a few very specific cases, but you still don't get composability (which is the whole point of using transactions).

    Also, the performance argument is inherently flawed. First of all, "sufficient performance" is a moving target. If I can make use of four times as many threads in my program, but it runs twice as slow, then it's still a win (if not now, then in a few years). Also, don't forget that STM basically gives you exception safety for free.

    I also think it's a misstake to consider transactions for everything. I understand why this is such a common knee-jerk reaction since most programmers are more familiar with imperative languages, but in my opinion using a language whose fundamental method of computation is "modify state" in a multithreaded environment where the state is shared and all the threads depend on the state remaining consistent from their point of view, is pretty much a bad idea. You're just asking for problems.
    STM really begins to make perfect sense when you consider it in a purely functional context. You may sprinkle in some imperative nuggets here and there when needed for algorithmic reasons (see the ST monad in Haskell - imperative code in a pure setting with static guarantees that no side effects leak out). These small sections of imperative code wouldn't be parallellized, of course. Then the vast majority of your code uses the pure functional approach with parallellism (e.g. using the "par" keyword in Haskell, or some of the new data parallel stuff), and then at the very bottom you have some thread based shared memory concurrency where it makes sense, which is where STM comes in.

    I don't think shared state concurrency with threads is the best way to do concurrency, but I don't think we can do completely without it either. And for that code, which in typical programs would be a very small fraction, you need something clever to get composability - STM fits the bill perfectly.
  • “Lock-less programming is all well and good for a few very specific cases, but you still don't get composability (which is the whole point of using transactions).”

     

    I completely disagree. There are many different *types of lock-free algorithms you can use to solve many of the fundamental problems that are attributed to the Reader/Writer Problem:

     

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/576ad37c34dd673a

     

     

    This particular race-condition is common to basically any multi-threaded application. So lock-free is more useful than you think… As for composability, well I have invented a lock-free eventcount algorithm which can be used as a completely composabile signal with atomic operation free fast-pathing on every call. This algorithm can work with basically any lock-based or lock-free algorithm. The heavy fast-pathing makes it work with lock-free. So, your assertion that lock-free algorithms are not composaible is fundamentally flawed.

     

     

     

     

    “Also, the performance argument is inherently flawed. First of all, "sufficient performance" is a moving target. If I can make use of four times as many threads in my program, but it runs twice as slow, then it's still a win (if not now, then in a few years). Also, don't forget that STM basically gives you exception safety for free.”

     

    Nonsense, I advise you to take the challenge on my appcore site:

     

    http://appcore.home.comcast.net/

     

     

    Have fun trying to get the STM version of a FIFO queue to compete with an algorithm that has no atomic operations (e.g., interlocked), and no #StoreLoad or #LoadStore memory barriers. This is just one example. If you use STM, then I can always design something that will have better scalability, throughput and overall performance characteristics. There is another class of lock-free algorithms that allow you raw pointer access into mutable shared data-structures’ without using any atomic operations or #StoreLoad, or even #LoadStore style memory barriers. There are many different ways to implement this. I suggest that you search Google groups, in comp.programming.threads, for ‘Partial Copy-On-Write Deferred Reclamation (PDR)’…

     

    *--http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=pdr&qt_g=1

     

    http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=vzoom&qt_g=1

     

     

    http://groups.google.com/group/comp.arch/search?hl=en&group=comp.arch&q=vzoom&qt_g=1

     

    … Lock-Free atomic reference counted pointers:

     

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f2c94118046142e8

     

    http://appcore.home.comcast.net/vzoom/refcount/

     

    , please note that Boost shared_ptr<> doesn’t for this kind of stuff because its thread safety level is Basic.

     

     

    … Fast-pathed read-write locks:

     

    http://groups.google.com/group/comp.programming.threads/search?hl=en&group=comp.programming.threads&q=fast-path+rw+mutex&qt_g=1

     

     

     

    … Mutexs, or 100% lock-based STM:

     

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0c011baf08844c4

    (a simple multi-mutex algorithm)

     

     

     

    “I also think it's a misstake to consider transactions for everything. I understand why this is such a common knee-jerk reaction since most programmers are more familiar with imperative languages, but in my opinion using a language whose fundamental method of computation is "modify state" in a multithreaded environment where the state is shared and all the threads depend on the state remaining consistent from their point of view, is pretty much a bad idea. You're just asking for problems.“

     

    Speak for yourself. No reason to fear mutable shared data-structures’. Heck, you want me to give one of the main fast-paths for my database server? Oh boy… Its made up of PDR and huge lock-free shared linked data-structures… STM will livelock for the kind of stuff I am programming unless it folds under its ‘contention-manager security blanket’ under load. This in unacceptable, IMHO, anytime your application “depends” on an explicit contention manager, IMHO, you know you have problems…

     

     

    “STM really begins to make perfect sense when you consider it in a purely functional context. You may sprinkle in some imperative nuggets here and there when needed for algorithmic reasons (see the ST monad in Haskell - imperative code in a pure setting with static guarantees that no side effects leak out). These small sections of imperative code wouldn't be parallellized, of course. Then the vast majority of your code uses the pure functional approach with parallellism (e.g. using the "par" keyword in Haskell, or some of the new data parallel stuff), and then at the very bottom you have some thread based shared memory concurrency where it makes sense, which is where STM comes in.”

     

    Humm… STM is using shared memory so you can’t get rid of it at all. Plus, if you put it in the hardware, the cache coherency mechniasism would simply have to be really strict. I am in favor of weak cache and well defined/documented memory model, not monolithich cache coherence designs! Nasty!

     

     

     

    “I don't think shared state concurrency with threads is the best way to do concurrency, but I don't think we can do completely without it either. And for that code, which in typical programs would be a very small fraction, you need something clever to get composability - STM fits the bill perfectly.”

     

    You have to learn the fundamentals’ of multi-threaded programming. I recommend “Programming with POISX Threads” by David Butenhof. It’s excellent. Don’t sacrifice yourself to composability or nothing programming paradigm! You will end up binding yourself to tons of overhead. STM will lead you down the wrong road, IMO. You can always design lock-based algorithms with POSIX that consistently outperform their well crafted STM equivalents. STM has to orchestrate to many things. Its kind of like a giant LL/SC atomic operations. Of course… LL/SC is subject to livelock, even in hardware. Special care has to be taken. You have to consider the size of the reservation granularity. The larger the granule, the more prone to livelock. STM is a giant LL/SC with the reservation granule being every single operation you make. If there are any interferences, your transaction, or LL reservation, gets shot in the head!!

     

    ;^)

     

     

    STM, well, that’s not an ideal synchronization scheme at all. IMHO, of course.

     

     

    -- P.S.

     

    Please visit comp.arch, or comp.programming.threads some time…

     

  • Man... What the heck happened to my post!?!?!!

    I will put it on my site, and post it here when I am done...

    Sorry... It looked fine in the preview!!

    Anyway, it will be on my site shortly.

    Sorry for any confusion!

    ;^(...
  • Well, on second look, the link work, and you can read the post... So I don't really need to repost it.
  • What about this:

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a0966697738790c4



    The interviewer did not press them on any other major alternatives'...
  • Cristom, I'm not a huge fan of your style of argument. You take on a very superior tone (such as assuming I don't know anything about multithreading) and make your point by posting tons of links, which makes following your argument very time consuming if not impossible. No doubt that leads you to "win" a lot of arguments by walk-over, but you're not actually convincing anyone. Summarize your point in way which is relevant to this discussion, don't assume I'm willing to get involved in 20 previous discussions to try to deduce your point from there.

    However you're missing the target completely. I already conceded that lock free programming is all well and good in certain cases, so there is no point linking a whole bunch of lock free abstractions. What I claimed was that while it is possible to write a lock free abstraction that will run faster than an STM based one (as you point out), you can trivially write an STM transaction that simply is not expressible with lock free programming. STM gives you general composability (i.e. always), lock free programming does not - which does not mean that it's impossible to write composable abstractions in specific cases (i.e sometimes).

    You also seemed to have misunderstood my point about using STM in purely functional programming because I can't make sense of your response to that. You're not making the erronous assumption that "purely functional programming == no mutable data" are you? Pure FP simply means that there are strong static guarantees that will ensure that a pure function won't use mutable state in any way which interferes with other pure functions (read about the ST monad). You also need some "impure functions" (really they're just pure values which represents impure functions through monads - so it's still pure) to do IO and some of the low level stuff - and that's the only place you'd use STM. So even if you think performance is horrible for STM, a typical functional program only has a handful of places where it would be useful anyway, so it's not a big problem -- 90% of the program is purely functional and thus has no problem with parallellism (which is different from concurrency). Most of my FP applications have no mutable variables at all, and the ones that do typicall have one or two or so (usually the ones with GUIs or lots of IO stuff). This is an ideal setting to introduce STM.

  • “Cristom, I'm not a huge fan of your style of argument. You take on a very superior tone (such as assuming I don't now anything about multithreading)”

    IMHO, you were coming across as if mutable shared data is too advanced or something… The ability to access mutable data without using atomic operations and memory barriers is key to scalability. This is impossible with STM. Period. End of story. BTW, if you know of a TM implementation that doesn’t make frequent use of those expensive operations, please post a link. I would be interested. Indeed.


    “and make your point by posting tons of links, which makes following your argument very time consuming if not impossible.”

    If you follow those links, you should have a better understanding of STM…

    “No doubt that leads you to "win" a lot of argument by walk-over, but you're not actually convincing anyone."

    I usually convince those who follow the links and do some research. There is more than one issue involved.


    “Summarize your point in way which is relevant to this discussion, don't assume I'm willing to get involved in 20 previous discussion to try to deduce your point from there.”

    The summary would be a couple of pages long… I would rather post links to prior art.



    “However you're missing the target completely. I already conceded that lock free programming is all well and good in certain cases, so there is no point linking a whole bunch of lock free abstractions.”

    Just showing you what’s out there. The video did not point out any alternatives’. I felt I just had to.



    “What I claimed was that while you can write a lock free abstraction that will run faster than an STM based one (as you point out), you can trivially write an STM transaction that simply is not expressible with lock free programming.”

    Well, the STM can be lock-free, although they are usually only obstruction-free but that’s another matter. I would post a link, but you don’t like that. So research on your own I guess. So, you are incorrect to say that you can do stuff with STM that you can’t do with lock-free programming; a STM can be based on a lock-free algorithm. See my point here?



    “STM gives you general composability (i.e. always), lock free programming does not - which does not mean that it's impossible to write composable abstractions in specific cases(i.e sometimes).”

    Its not impossible. The price you pay for general composability is far to great. BTW, check this out:

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a0966697738790c4/

    you can get composability with POSIX mutexs and condition variables. The video did not mention any major alternatives’. Unimpressed.

    “You also seemed to have misunderstood my point about using STM in purely functional programming because I can't make sense of your response to that. You're not making the erronous assumption that "purely functional programming == no mutable data" do you?”

    You abstract the shared memory away so much that all of its advantages are rendered useless by all of the massive amounts of coherency traffic that the TM generates.

  • The video mentions overheads, and seems to fall back on the good ol’ saying:

    “Oh well, we will wait for hardware support”

    Man… TM on the hardware is a no go.

    The cache coherency mechanism that could properly support a robust transactional memory scheme would have to be really strict IMHO... Think of scenarios in which there could be lots of reader threads iterating over large shared linked data-structures in parallel... The transactional coherency system could potentially wind up having to track all of those many thousands of reads transactions' that will be generated my the frequently reading threads... It would die from livelock; ahh the explicit contention manager to the rescue! So, when the times get tough for TM, it basically is forced to fold under its security blanket, and wait for things to cool way down for a while... This is because TM is kind of like obstruction free algorithms... One small trivial interference from another thread, and the operations dies and asks the contention manager if it can retry... I don't think TM can scale very well at all wrt the previous scenarios that I briefly described...

    That is a lot of work for a synchronization instruction... Whatever happened to plain old CAS!

    :O

    IMHO, they have to implement transactional memory in the hardware if they want it to be even remotely usable. I can make use of a transactional instruction in vZOOM, however I would try to avoid its use at all costs...

    The transactional instructions have simply got to be more expensive than a "normal" CAS or LL/SC... I would not allow my reader threads to use the transaction instructions when they are traversing through large data-structures. This is not because I don't like TM; I don't let my reader threads use CAS, or any atomic operations for that matter... Reader threads don't seem to like to call these types of instructions. The amount of searches' they can perform per-second getschanges drastically when CAS and/or memory barriers are used. I could only imagine the overhead involved with many reader threads frequentlysearching/traversing throughout a data-structure that has hundreds-of-thousands or evenmillion+ nodes during periods of load. Design a TM can be tedious.

    My main problem with TM is that reader threads have to call into the transaction api before they traverse a data-structure. In most cases every read has to be tracked; validating at the page-level is too coarse. If they end up traversing tens-of-thousands of nodes, that's a lot of checking/validating. If the application is under load, and writer threads modify the page of memory that a reader thread happens to be reading from
    seems to raise a red-flag in the presented algorithm. Then the validation process has to drill down on the page, to enhance transaction granularity. If alls that it takes is a simple store into the page to abort a transaction, then live-lock can and should occur... So they have to get more fine grain than a page...

    Cache-coherency protocol deals with transaction that fit into cache line... Humm, this may make the protocol too complex/strong. I am NOT to a big fan of extremely strong cache. t a shame than TM might be responsible this nosense with the strong cache. I want an architecture very weak cache coherency, with crystal clear memory model documentation. I would be fine with the Alpha. But STM on an Alpha would be a joke. Heck, can’t even use it for any sort of distributed algorithms either.

    Any thoughts?

  • I really think you're not really trying to see the point here. It also seems that you may have a financial interest in not seeing the point. Which makes arguing with you pretty much a waste of time.

    Look at the video again. They specifically say that skilled programmers can use existing techniques to write correct code - they're not claiming that it's not possible to work around the difficulties of multi threaded programming with existing techniques. The point here, though, is that it is difficult. Your own posts is are a case in point. Why are you so proud of your inventions of various lock-free data structures if it's so easy? Because it isn't very easy, which is the whole reason for why you want STM instead.
    Getting everything right with locks or lock-free programming techniques is difficult - if you're selling software based on this I can understand why you would argue otherwise, but I'm not going to waste my time arguing against a paycheck (everyone whose salary isn't based on pretending it's easy will agree that it's difficult). Not impossible, but difficult. Your argument that it's possible to use other strategies is completely pointless. It's possible to write a full comercial business application using nothing but assembly as well, but is it practical? No not really, and as concurrency becomes more common using the sophisticated and clever lock-free programming techniques will stop being practical as well.

    Also, I don't think you're using the word "composability" in the same way I am (nor indeed the way they use it in the video). E.g. my point is that if you have language support for STM, you take a hash table, then you can do the operation "remove A insert B" without leaving the intermediate state visible (then you can do that in combination with other techniques, without leaving any intermediate states visible). If the hashtable is written using lock free and/or lock based abstractions to be thread safe for insertions and deletions, that still doesn't mean you can do the operation you want in an atomic way.
    What if I want to read 4 elements, or none at all, from your nice lock free FIFO queue? What if I want to do that operation on two different FIFOs (from different vendors), or none at all? What if I want to combine that with some operations of my own, or do none at all?
    STM composes because you can create new concurrency abstractions from existing ones without having to have access to the source code of the original abstractions.

  • cristom wrote:
    
    IMHO, they have to implement transactional memory in the hardware if they want it to be even remotely usable.


    Meanwhile, Haskell programmers have been "using" (!= usable?) STM for quite a while now and found it quite neat.
    You're pitting STM up against lock free data structures in very simple cases, but who says you HAVE to use STM in those cases? You would use STM in cases where there are complex interaction patterns with the shared data where simple lock free structures don't work as well. Using e.g. a transactional message queue between two threads when there is no need for atomic transactions is something you wouldn't do if you're concerned for performance.
    And again, this is not something that would work well with C, it works well with a paradigm which is more suited for multitheading in the first place (FP) because the number of places where you need to even worry about shared state is very limited.
  • For those who don't follow links, here are some important snippets from some of them:


    > I am wondering where RW locks are appropriate ??


    Basically, they work fairly well with "read-mostly" data-structures. If you
    don't really care about performance and scalability, then most rw-mutex
    implementations should work perfectly fine; however, in some cases they are
    definitely not an "optimal solution". For instance, IMHO a rw-mutex would
    probably not be an ideal candidate for protecting a "critical shared
    collection" than can experience episodic periods of sustained
    "moderate-to-heavy write activity". Reader threads that use rw-mutexs to
    protect such a container would probably not perform and/or scale very well
    at all with regard to adding cpus. One of the reasons why is usually due to
    the fact that most rw-mutex implementations force reader threads to use some
    rather expensive atomic/membar/blocking methods in order to enforce
    synchronization and proper data-coherency among the participating
    processors; this is not very cache friendly, not at all. Reader threads
    would not really be able to keep the processors "caches primed" and probably
    would not experience the full benefits of a modern multi-processor system.
    For instance, a reader thread that "frequently" executes a store/load style
    membar would constantly be "invalidating and thrashing the cache"; not good
    at all! The thread could spend most of its time "waiting" for the
    modifications to the "rw-mutexs internal state" to become "fully visible" to
    all of processors involved. This process is usually made up of fairly
    expensive operation(s) that may involve the overhead of cache snooping
    and/or locking the system bus. The overhead involved could actually prevent
    reader threads from truly exeuting in parallel; not good.

    Therefore, IMHO, a rw-mutex is generally a bad solution for a critical
    shared data-structure if the overhead of acquiring and releasing it is
    greater than the overhead of "reading the data-structure in a
    single-threaded application"...

     

    > is it appropriate when only you don't want 'dirty' reads??


    Not really, but yes they do eliminate the possibility of processing "stale
    data"; however, as stated above the methods involved are usually expensive.
    Fortunately, there is a more advanced class of algorithms out there that do
    not tend to add a boatload of extra overhead and can allow reader and writer
    threads to execute in parallel. Please note that this behavior alone can be
    beneficial to many application designs. For instance, since readers no
    longer block writers an existing synchronization design can usually be based
    on a "much simpler locking scheme"; however, stale reads can and will occur
    in this type of setup. Luckily, one can address stale/dirty reads in a
    number of interesting ways:

    http://groups.google.com/group/comp.programming.threads/msg/523aa1e77...


    Another solution for stale reads involves writer threads making copies of
    existing shared data-structures and then atomically swapping them with the
    existing ones. A garbage collection scheme can be used to free the "old"
    data-structures. This solution does not add overhead to reader threads,
    unless, the method of garbage collection used is sub-optimal...

     


    "John Hickin" <hic...@nortelnetworks.com> wrote in message


    news:ec4u46$hvv$1@zcars129.ca.nortel.com...


    > "David Schwartz" <dav...@webmaster.com> wrote in message
    > news:1155889979.905637.238660@i42g2000cwa.googlegroups.com...
    [...]

    http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover.xml&xsl=article.xsl&
    ("The Problem with Threads" - IEEE's Computer magazine article)


    > This example is quite good and it explains what I don't like with threads.
    > In the real world, 8 of the 10 people in your example will go on to do
    > something else. In a multi-threaded world, 8 threads are blocked.

     

    http://groups.google.com/group/comp.programming.threads/msg/a3e38e27d...
    ( refer to the pseudo-code at the bottom of the msg )

    Any thoughts?

     

    > Unless, of
    > course, you use some sort of lock-free algorithm. Unless I've missed the
    > mark, lock-free is quite avant guard, and not taught at the undergraduate
    > level. It may be time for a change there.


    Totally agreed:

    http://groups.google.com/group/comp.lang.c++.moderated/msg/d07b79e963...


    FWIW:


    I don't agree with the commonly assertion that lock-free programming is far
    to complicate for a "normal" programmer to learn.


    I also don't agree with the assertion that threads are vastly too difficulty
    and fragile for any programmer to use... The paper the OP linked to seems to
    attempt to make this type of assertion.

     

    > Anyway, the article was interesting to read. The author is an IEEE Fellow,
    > and you don't get that designation without earning it.

     

    Humm... Did you read the part where is basically says that programmers that
    use threads are insane? He points to a "folk saying",

    'repeatedly performing the same action and expecting different results
    defines a form of insanity'
    (paraphrase)


    The he says this, again paraphrasing


    'if programmers that use threads were not insane, they simply would not be
    able to understand any of their programs'


    I wonder if he was being serious... Anyway....


    Yikes! I guess that I am INSANE!!!!!!!!!
    YeeeHaw.... Weeeee.....


    Wink


    I find it amusing that the IEEE Fellow seems to not be able to create a
    thread-safe observer pattern... I have several lock-free observer patterns
    that can rely on my vZOOM or Joe Seighs RCU-SMR algorithms.


    For some reason, I don't think that the author could create a highly
    scaleable thread-safe, lock-based or lock-free, observer pattern
    implementation if his life depended on it???


    :O

     

     


    (simply EXECELLENT Wheeler posts!!!)


    "Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> writes:
    > IMHO, I believe its too early in the game for actual hardware
    > support, even though they really do need it in order for TM to
    > become really useful. All of the software emulations I have seen are
    > loaded with expensive atomic operations; mutexs seem more
    > efficient...


    original 801 had a form ... it was used by journal filesystem
    for aix in rs/6000.

    filesystem metadata was organized ... and the memory system caught all
    changes ... so basically you avoided having to do all the explicit
    logging/locking statements in the filesystem code ... you did have to
    add commit calls.


    there was some disagreement whether the overhead of explicit
    logs/locks were more expensive than the implicit overhead involved in
    transaction memory. turns out there was the overhead of scanning the
    whole memory area for changes at commits.


    a portable version of the journal filesystem code with explicit
    lock/logging changes in the code turned up the explicit lock/logging
    calls had lower overhead than transactional memory (at least in the
    journal filesystem case).


    and unrelated old reference from about same period as aixv3 ... the
    Proceedings of the 20th International Symposium on Fault-Tolerant
    Computing Systems, pp 89--96, Newcastle, June 1990.


    A Fault Tolerant Tightly Coupled Multiprocessor Architecture based on
    Stable Transactional Memory


    Authors: Michel Banatre and Philippe Joubert
    Abstract:


    Traditionally, tightly coupled multiprocessors allow data sharing
    between multiple caches by keeping cached copies of memory blocks
    coherent with respect to shared memory. This is difficult to achieve
    in a fault tolerant environment due to the need to save global
    checkpoints in shared memory from where consistent cache states can be
    recovered after a failure.


    The architecture presented in this report solves this problem by
    encapsulating the memory modifications done by a process into an
    atomic transaction. Caches record dependencies between the
    transactions associated with processes modifying the same memory
    blocks. Dependent transactions may then be atomically committed. Such
    an operation requires a cache coherence protocol responsible for
    recording process dependencies as well as keeping coherent cached
    copies of blocks and a shared Stable Transactional Memory owing to
    which all memory updates are atomic to allow recovery after a
    processor failure.


    --
    Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

     

     

    "Chris Thomasson" <cris...@comcast.net> writes:
    > Good point. Do you happen to know if they implemented it so they could
    > successfully pop from a lock-free stack? I wonder....


    charlie had invented compare&swap as part of his work on fine-grain
    locking (leading to some number of lock free operations) for cp67
    (360/67) mutliprocessor support at the science center
    http://www.garlic.com/~lynn/subtopic.html#545tech

    the trick then was to come up with a mnemonic that matched Charlie'
    initials, CAS.


    the attempt was then made to get the instruction into the up and
    coming 370 architecture. working with ron smith in the pok 370
    architecture group (they owned the 370 architecture "red book"), the
    response was that 370 didn't need another multiprocessor specific
    instruction, that the test and set from 360 was sufficient.  


    to get compare and swap into the 370 architecture we had to come up
    with useage for compare&swap that wasn't multiprocessor specific.
    thus was born some number of examples that were applicable to
    multi-threaded applications that might be running enabled for
    interrupts ... independent of whether the machine was single processor
    or multiprocessor.


    originally in the 370 principles of operation, the examples were part
    of the programming notes that were part of the compare&swap
    instruction. in subsequent version of the principle of operations the
    examples were moved to a section in the appendix.


    also as part of this activity, compare&swap double instruction was
    added in addition to compar&swap. that resulted in two instructions
    for 370, compare&swap along with compare&swap double ...  so the
    instruction mnemonics become CS and CDS (instead of CAS ... defeating
    the original objective of coming up with instruction name
    compare&swap).


    total topic drift ... science center was like that ... GML
    http://www.garlic.com/~lynn/subtopic.html#sgml


    precusor to SGML, HTML, XML, etc ... also invented at the science
    center, actually are the first initials of the last name of the three
    inventors (and you probably thot it stood for generalized markup
    language).


    misc. past posts on multiprocessor support and/or compare&swap
    instruction
    http://www.garlic.com/~lynn/subtopic.html#smp


    esa/390 principles of operation appendix for multiprogramming
    (i.e. multithread) and multiprocessing
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/A...


    cs & cds appendix:
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/A...


    bypassing post and wait
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/A...


    lock/unlock:
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/A...


    free pool manipulation
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/A...


    and more recent z/architecture (64-bit) principles of operation
    multiprogramming and multiprocessing examples appendix
    http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9ZR003/A...


    note that the above also includes discussion of the newer PLO ...perform lock
    operation instruction

     

     

    --
    Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

     

     

     

    - conversation with Andy...

    http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5/a3ebfe80363a4399?lnk=gst&q=why+do+double+wide+compare+and+swap&rnum=1#a3ebfe80363a4399

     

    "Andy Glew" <first.l...@employer.domain> wrote in message


    news:peyp8xnzl9g2.fsf@PXPL8591.amr.corp.intel.com...


    >> > Are there a rule of thumb as to when LL/SC starts showing
    >> > scalability issues?

    >> IMHO there is no rule of thumb. Scalability issues are not typically
    >> induced by LL/SC itself.


    > I disagree.


    > Although maybe it's not really scalability issues.  LL/SC starts
    > introducing deadlock / forward progress issues the moment you have
    > more than one processor on any interconnect that is not a simple
    > snoopy bus, and they just get worse as you add more, and have more
    > complicated interconnects.

     

    Something like this?:

    http://groups.google.com/group/comp.arch/msg/d40cf092b679169d


    Coarse reservation granularity can cause live-lock in LL/SC via. simple
    false-sharing. I really do believe that HTM is going to suffer from the same
    thing... Multiple processors can live-lock if they keep interfering with the
    reservations that track the TM.


    Humm... Kind of like the live-lock that is inherent in any obstruction-free
    algorithm; they need to use "backoff-and-retry" algorithms to keep up
    forward-progress...

     

     

    Good response post form Intel Architect Andy Glew:


    > Humm... Kind of like the live-lock that is inherent in any obstruction-free
    > algorithm; they need to use "backoff-and-retry" algorithms to keep up
    > forward-progress...


    Yeah, I was thinking of correcting my older post to try to get to this.

    It is not fundamentally LL/SC vs. CAS and other atomic RMWs.


    It is more that LL/SC was originally proposed and implemented via an
    "optimistic concurrency" type of address "link", whereas CAS and other
    atomic RMWs are traditionally implemented via a "lock" - whether bus
    lock, cache lock, or address lock.


    If other people try to access the lock while the RMW is in flight they
    are stalled or told to go away, come back later. This guarantees
    forward progress.


    Whereas if other people access the link between the LL and SC, it is
    the SC that fails. In the presence of an adversary doing ordinary
    stores, the LL/SC might never complete.


    But these are just implementation artifacts.  E.g. you can implement
    CAS or an atomic RMW with LL/SC, whether in user code or microcode -
    and such an RMW would have the same forward progress problems as
    LL/SC. You can implement a hybrid - try the optimistic concurreny
    approach first, and then try the lock approach.


    Similarly, you can implement LL/SC by acquiring a lock, guaranteeing
    that the SC will finish.  But then you need some way of backing out,
    protecting yourself against a malicious user who does the LL but never
    the SC. E.g. the Intel 860 released such a lock after 32 instructions.

     

     


    Chris Thomasson wrote:
    > "Joe Seigh" <jseigh...@xemaps.com> wrote
    >>What's with the Sparc architects?

    > IMHO, I believe the major reason 32-bit architectures support DWCAS was to
    > be "64-bit ready"; DWCAS gets converted to normal CAS on 64-bit systems. So,
    > if your 32-bit applications used DWCAS, then they can be run 64-bit arch
    > without changes... Of course there is a caveat...


    > A 32-bit application cannot use DWCAS to manipulate a pointer along with
    > another contiguous variable...


    > struct dosent_work_on_64_bit_s {
    >   void *a;
    >   whatever b;
    > };


    > IMO, the hardware architects' did NOT have lock-free algorithms in mind when
    > they decided to put DWCAS in their 32-bit instruction sets. The fact that
    > 128-bit CAS is not reliably supported seems to support my opinion...

     

    IBM S370 had double wide compare and swap long before 64 bit support
    became an issue.


    >>There's a bit of a disconnect here I think.


    > Perhaps they are designing the lock-free algorithms for a upcoming processor
    > design. Maybe one that has hardware transactional memory... That could be
    > the reason they designed KCSS:

     

    They've done a bit on STM (software transactional memory).  If they did
    come out with hardware based transactional memory it would be after the
    fact of 64 bit sparc and wouldn't be generally available.  So it would
    have to be hidden behind some system or library api with alternate
    implementations on non TM platforms.  That would limit its applicability.
    For example, you couldn't have atomically thread-safe refcounted smart
    pointers.  Well, not without RCU and/or SMR.  If they go with STM to
    compliment hw TM, it's only going to work at a lower contention level
    if the STM algorihtm is only obstruction-free.

    It's a big problem.  You can design efficient synchronization mechanisms
    that are portable if you ignore Sparc.  If you want portability to
    Sparc you have to sacrifice performance or functionality.


    If Sun has a 64 bit JVM implementation, I wonder what they do for
    the AtomicMarkedReference and AtomicStampedReference implementations
    for non Sparc platforms.  Cripple them so they perform as poorly as
    the Sparc implementations?


    --
    Joe Seigh


    When you get lemons, you make lemonade.
    When you get hardware, you make software.

     


    Text taken from:


    http://groups.google.com/group/comp.arch/msg/e1e391055c7acecb

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c572b709248ae64

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6399b3b837b0a40

    http://groups.google.com/group/comp.arch/msg/1b9e405080e93149

    http://groups.google.com/group/comp.arch/msg/bbbc035cf1a8502c

    http://groups.google.com/group/comp.arch/msg/11b14c4bda2d5d82

    http://groups.google.com/group/comp.arch/msg/9b00fda2752966f9

    http://groups.google.com/group/comp.arch/msg/335aeb22fd6fe526

    http://groups.google.com/group/comp.arch/msg/1ace9400b1b16cd4

    http://groups.google.com/group/comp.arch/msg/995379a16beb3b69
    (simply excellent Wheeler post)

    and more...

    Any thoughts?
    ;^)

     


     

  • cristom wrote:
    

    For those who don't follow links, here are some important snippets from some of them:


    SNIP!



    Okay, this is only slightly less annoying/insulting than outright spam. Clicking links was not the problem, having to deduce your points from multiple related discussions, was. You're not that important to me. I don't care about you enough to read everything you've posted on related subjects and from there figure out where you stand on this issue.

    If you have a specific point that has to do with STM, please make it. If you have a specific point to anything someone else posted here, please make it.
    If, however, you don't feel like you have time and energy to extend the common courtesy of doing so in a coherent way by formulating it specifically for this discussion, in a way which flows naturally together with the topic of discussion, then please note that posting here is completely volontary.

    Also, please note that examples are not proof. Yes there are, as said many times already, cases where STM will be slower than other methods, but listing them is completely irrelevant. The point here is that there are plenty of cases where STM helps out a lot, and is probaby the only practical way of writing software in certain circumstances. E.g. let's say you have a game world, and 10000 game objects that interact with it in complex and data dependent ways (i.e. there is no way to partition the world into lockable regions). Each of these objects interact with about half a dozen of the other objects, and they do so concurrently, and the interactions must be atomic. How do you solve that with lock free data structures? The game world consists of multiple different data structures nested in complex ways (oct-trees, portal cell graphs, BSP trees, etc.). How do you solve it with locks? You really don't, unless you stick a big lock on the entire world, or use some global locking order or something equally detrimental to composability and productivity. STM in this case "Just Works".

    Yeah, STM for a FIFO is overkill, but showing how a non-STM implementation of a FIFO is faster than an STM based one does not prove that STM is a waste of time. Again, you're pitting STM up against other approaches only in very simple cases, which proves exactly nothing.
  • "I really think you're not really trying to see the point here. It also seems that you may have a financial interest in not seeing the point. Which makes arguing with you pretty much a waste of time."

    IMHO, STM is a deceptive marking ploy. They never get into the alternatives’. BTW, I mention many different ways to do things. I did not just refer you to my vZOOM algorithm; I also refer you to Joe Seighs RCU+SMR algorithm, and some others. So you wrong again… Unlike the video, I point out all of the caveats, and explain them in detail.

     


    "Look at the video again. They specifically say that skilled programmers can use existing techniques to write correct code - they're not claiming that it's not possible to work around the difficulties of multi threaded programming with existing techniques. The point here, though, is that it is difficult. Your own posts is are a case in point. Why are you so proud of your inventions of various lock-free data structures if it's so easy?"

    The complexity can be hidden behind a simple and portable API, that anybody can use. Like the pthreads api. We all know that a particular pthread implementation may be very complex, and may very well use some lock-free algorithms. But, does the average user actually care about this kind of stuff? My point is, again "The complexity can be hidden behind a simple API". Really.

    Would there be bugs... Well, maybe; it depends on the implementer. There may be very subtle bugs in your pthread implementation as well...

     

    "Because it isn't very easy, which is the whole reason for why you want STM instead."

    No. STM is NOT a good solution if you’re interested in execellent scalability and performance.


    "Getting everything right with locks or lock-free programming techniques is difficult - if you're selling software based on this I can understand why you would argue otherwise, but I'm not going to waste my time arguing against a paycheck (everyone whose salary isn't based on pretending it's easy will agree that it's difficult). Not impossible, but difficult. Your argument that it's possible to use other strategies is completely pointless. It's possible to write a full comercial business application using nothing but assembly as well, but is it practical? No not really, and as concurrency becomes more common using the sophisticated and clever lock-free programming techniques will stop being practical as well."

    Utter nonsense. The power of lock-free programming techniques will be in demand. Man. Concurrency has been common for a very long time now. Ever heard of RCU? Heck, this stuff, including STM and HTM has been around for the past 25 years at least. Don't try to act like this is something new. Lock-free has been around for ages, and it not going to go out of practice. Its going to be in demand. That is, if you want your application to have excellent scalability and throughput characteristics

     


    "What if I want to read 4 elements, or none at all, from your nice lock free FIFO queue?"

    Simple, you use an eventcount.

     


    "What if I want to do that operation on two different FIFOs > (from different vendors), or none at all? What if I want to combine that with some operations of my  > own, or do none at all?"


    This is example of bad programming practice. Having to force things together is a sign that there is a MUCH better way. The overhead involved with your request is too great. Again, I urge to you to post to comp.programming.theads and comp.arch. Please.
    Anyway, you can use the eventcount to add fine-grain ultra low-overhead conditional blocking and coordination of your data with lock-free collections. Too much to post here. The technique deserves its own thread. STM can’t compare with this wrt scalability; its dead in the water. The performance increases of most alternatives to STM are usually an order of magnitude.

     

    "STM composes because you can create new concurrency abstractions from existing ones without having to have access to the source code of the original abstractions."

    I am not willing to sacrifice my server applications performance. Sorry, not sold.

  • C is fine. You can create STM implementation in C/Assembly. Just follow the rules:

     

    http://groups-beta.google.com/group/comp.programming.threads/msg/423df394a0370fa6

     

    Okay… If I had to use STM, I would use it in C, not Haskell. Humm, I have created experimental STM that try to get around some of the overheads. Still not sure I want to Patent it or not. Perhaps I should patent it, and sell it as a much faster alternative to most of the STM implementations’. Humm. A couple of implementations of mine beat Haskell by a wide margin. Indeed.

     

    Okay… If I can’t convince you the STM is not a good solution at all… How would you feel about using a STM algorithm with a pure C interface? I have several solutions. I created them way back when I was experimenting/inventing my vZOOM algorithm. I needed something to test it against. I was very disappointed with every STM implementation out there. So, I invented a couple of new ways to implement STM… I guess if STM is going to be rammed down programmer’s throats, I might as well get the experimental prototypes’ and post them here…

     

     

    I have 2 STM word-based implementations. One is written is about 75% IA-32 assembly language, and the other is in about 90% SPARC V9 assembly language… The all have C API/ABI. Standard transaction API (e.g., begin, load, store, commit, abort, validate), and they all outperform Haskel.

     

     

    Interested?

  • cristom wrote:
    
    I am not willing to sacrifice my server applications performance. Sorry, not sold.


    And you don't have to. Writing a web server probably doesn't require STM at all.

    The whole point here is, again, that you take some very simple cases say "X is faster than STM in case x, Y is faster than STM in case y" and you ignore all the difficult cases alltogether, or the fact that STM is an easy to use technology which Just Works for both x and y without having to do anything special (again, nobody has claimed that it isn't possible to write multithreaded programs without STM, please try to understand that very important point).

    Embarassingly parallel applications (like e.g. a web sever) are not hard to write, and you can get a way with very light weight abstractions that are very fast indeed. But try writing, say, a game engine which scales to hundreds or thousands of threads and you'll understand that a couple of clever lock-free data structures is not a viable solution for parallellizing problems in general.
    Again, if you're going to compare against "alternatives", please do so in an intellectually honest way, rather than cherry picking specific examples where there are existing data structures that work well. The whole point here was that with STM it's all of a sudden a reasonable proposition to split something like a game engine up into hundreds and thousands of threads - something which is very difficult to do without something as general and flexible.

    Your argument is akin to someone arguing that assembly is better than C because a couple of small programs he's written were faster when written in hand tuned assembly than when compiled with a C compiler. It might be true, but that doesn't mean assembly is a better choice in general. For sufficiently large applications you're not going to be able to do any meassuring of the difference in performance, because only one of the version will actually run properly (the one which sacrifices speed for usability).
  • cristom wrote:
    

    C is fine. You can create STM implementation in C/Assembly.

     

    ...

    Interested?



    Actually I'm really not. I've posted twice why I don't think this makes sense in an imperative language. I would recommend you post it and point it in the direction of Simpon Peyton Jones and Tim Harris though. If your implementation really is as good as you say, and it does what the Haskell one does, I'm sure they will be willing to do it your way instead.
  • "And you don't have to. Writing a web server probably doesn't require STM at all."

     

    I am mostly talking about database servers the make heavy use of shared memory for fast-pathing database queries. I can traverse mutable data-structures with atomic operations and/or memory barriers. Again, this is impossible with STM. Using STM would make that fast-path run very slow because of the vast amount of coherency traffic that the TM logic generates.

  • ACK!!!

     

    "I am mostly talking about database servers the make heavy use of shared memory for fast-pathing database queries. I can traverse mutable data-structures with atomic operations and/or memory barriers. "

    Is suppose to read as:

    I am mostly talking about database servers the make heavy use of shared memory for fast-pathing database queries. I can traverse mutable data-structures "WITHOUT" atomic operations and/or memory barriers.



    Sorry for any confusion.

  • "Actually I'm really not. I've posted twice why I don't think this makes sense in an imperative language."

     

    I think we have to agree to disagree on this. I have no problems with using C for creating high-end multi-threaded applications.


     

    "I would recommend you post it and point it in the direction of Simpon Peyton Jones and Tim Harris though. If your implementation really is as good as you say, and it does what the Haskell one does, I'm sure they will be willing to do it your way instead."

     

    Its definitely as good as I say it is, however, it doesn’t have some of the features that Haskell has. You should think of my STM as a “lean-and-mean” minimalist version. I was trying to go for extreme speed in the STM implementation itself, some of the stuff that Haskell uses makes this impossible.

     

    ;^(…

     

     

    Anyway… I will think this over. To patent, or not to patent. I wonder how popular a fairly speedy C/C++ STM alternative to Haskell would actually be. I personally would not even want to use my STM in any of my application infrastructures’. But if most of the programmers go for STM, I could do well by providing a top-notch minimalist alternative in C/C++.

     

     

    Humm..

  • cristom wrote:
    

    ACK!!!

     

    "I am mostly talking about database servers the make heavy use of shared memory for fast-pathing database queries. I can traverse mutable data-structures with atomic operations and/or memory barriers. "

    Is suppose to read as:

    I am mostly talking about database servers the make heavy use of shared memory for fast-pathing database queries. I can traverse mutable data-structures "WITHOUT" atomic operations and/or memory barriers.



    Sorry for any confusion.



    Again, what does this prove? Nothing!
    "I have this prime number generator written in asm which is much faster than the same algorithm produces when compiled with VC, ergo C is useless."

    Again, you point out specific cases where existing technology works well, and conclude that STM is useless. The point here though is that there are plenty of applications which are essentially single threaded because splitting them up in multiple threads is too difficult to be a feasible proposition.  It's for these difficult cases, cases where the traditional methods are too difficult to use, that STM will help.

    The point is that STM is really nice and elegant for the general case. Your whole argument basically boils down to premature optimization. Use STM, if it's too slow somewhere, and you can find a way to get it faster (most likely by compromising your ability to use this abstraction in a composable way in the future - which may be a completely valid tradeoff) then you are free to do so. Just like you can use inline asm in C where it's needed.
  • One other MAJOR *point about TM... STM or HTM, my implmentation or not, both "suffer" from a not so well known phenomena:

    Inconsistent data can, and will, be accessed in a transactional atomic block! A users algorithm for a transactional based system is based solely inside of an atomic block, e.g.,


    atomic { // start transactional atomic block 
     
    users_algorithm{
       // Access transactional data-structures
     }

    } // try to commit transactional atomic block


    This means that the users algorithm can actually "operate" on inconsistent data... No kidding, not at all... therefore, a users algorithm behavior is basically undefined, whether the programmer likes it or not... Think about it for a minute... If your algorithm is fed with "no-so coherent" data, then your algorithm will be subject to undefined behavior by default...

    Wow... The implementations that I have seen have to actually catch any and all fatal exceptions that can, and WILL, arise when a user algorithm is subjected to inconsistent data... Sometimes your algorithm crashes right off the bat and gets intercepted by the STM where the transaction will be aborted, and retried... Sometimes your algorithm will be stuck in an infinite loop (most common problem, IMHO) , inside the atomic block... STM has to watch out for this kind of stuff... IMHO, this is a MAJOR drawback and a down right dangerous aspect of transactional memory... This behavior is inherent in basically any TM implementation; hardware or software...

    Need to think some more on this troubling subject... Now you know just some of the reasons why I don't like TM...

    Yikes!

    :O

    * I can't believe I forgot to mention this; totally slipped my mind. Right when I remembered this I went over to Wikipedia to see if anybody bring this fact up; somebody did... IMO, its vital that anybody who thinks about using TM completely understands this particular caveat!

  • cristom wrote:
    One other MAJOR *point about TM... STM or HTM, my implmentation or not, both "suffer" from a not so well known phenomena:

    Inconsistent data can, and will, be accessed in a transactional atomic block! A users algorithm for a transactional based system is based solely inside of an atomic block, e.g.,


    atomic { // start transactional atomic block 
     
    users_algorithm{
       // Access transactional data-structures
     }

    } // try to commit transactional atomic block


    This means that the users algorithm can actually "operate" on inconsistent data... No kidding, not at all... therefore, a users algorithm behavior is basically undefined, whether the programmer likes it or not... Think about it for a minute... If your algorithm is fed with "no-so coherent" data, then your algorithm will be subject to undefined behavior by default...

    Wow... The implementations that I have seen have to actually catch any and all fatal exceptions that can, and WILL, arise when a user algorithm is subjected to inconsistent data... Sometimes your algorithm crashes right off the bat and gets intercepted by the STM where the transaction will be aborted, and retried... Sometimes your algorithm will be stuck in an infinite loop (most common problem, IMHO) , inside the atomic block... STM has to watch out for this kind of stuff... IMHO, this is a MAJOR drawback and a down right dangerous aspect of transactional memory... This behavior is inherent in basically any TM implementation; hardware or software...

    Need to think some more on this troubling subject... Now you know just some of the reasons why I don't like TM...

    Yikes!

    :O

    * I can't believe I forgot to mention this; totally slipped my mind. Right when I remembered this I went over to Wikipedia to see if anybody bring this fact up; somebody did... IMO, its vital that anybody who thinks about using TM completely understands this particular caveat!



    This may be a problem in C, but not in Haskell.
    Haskell uses lazy evaluation, i.e. it's demand-driven. Nothing will be computed unless it's needed, and before any of writes are performed or data returned and forced by the caller (which is the only two things that can force evaluation) the accessed transactional variables are checked for consistency.

    I.e. if you do something like the following in Haskell:

    atomically $ do { 
             v1 <- readTVar t1;

             v2 <- readTVar t2;

             writeTVar t3 ( willDivergeIfInconsistent v1 v2 )
    }

    t1 and t2 will be validated before t3 is written to, and the writing of t3 is the only thing that will force willDivergeIfInconsistent to be evaluated, and at that point v1 and v2 are guaranteed to be valid, thus nothing will ever be evaluated with inconsistent data.
    This holds for every atomic computation, even with interleaved reads and writes, since all the data used by a function will be validated before it can be retrieved and thus must be at least consistent enough to execute the call properly.

  • One problem with implementing software transactional memory is that it's possible for a transaction to read inconsistent state (that is, to read a mixture of old and new values written by another transaction). Such a transaction is doomed to abort if it ever tries to commit, but it's possible for inconsistent state to cause a transaction to trigger a fatal exceptional condition such as a segmentation fault or even enter an endless loop, as in the following contrived example from Figure 4 of "Language Support for Lightweight Transactions":

    atomic {
        if (x != y) 
            while (true) { }
    }
    

    Transaction A

    atomic {
        x++;
        y++;
    }
    

    Transaction B

    Provided x=y initially, neither transaction above alters this invariant, but it's possible transaction A will read x after transaction B updates it but read y before transaction B updates it, causing it to enter an infinite loop. The usual strategy for dealing with this is to intercept any fatal exceptions and periodically check if each transaction is valid; if not, it can be aborted immediately, since it's doomed to fail in any case.

  • “E.g. let's say you have a game world, and 10000 game objects that interact with it in complex and data dependent ways (i.e. there is no way to partition the world into lockable regions). Each of these objects interact with about half a dozen of the other objects, and they do so concurrently, and the interactions must be atomic. How do you solve that with lock free data structures? The game world consists of multiple different data structures nested in complex ways (oct-trees, portal cell graphs, BSP trees, etc.). How do you solve it with locks? You really don't, unless you stick a big lock on the entire world, or use some global locking order or something equally detrimental to composability and productivity. STM in this case "Just Works".”

     

     

    If you don’t know how to write a high-end multi-threaded game engine without using STM, that’s your problem. Game engines HEAVILY benefit from assembly language and other assorted low-level tricks and techniques. Low-overhead lock-free algorithms are a must. STM in a game engine? No way. I want my game to be very fast. STM would prevent the game engine from realizing the full potential of today’s high-and architectures…

     

    BTW, how would STM work on the Cell Processor? I don’t think it’s even possible for STM to even being to make use of the Cells SPU’s. Ha…. You want to program games? Do you know assembly language inside in out? Do you know how to use DMA with the Cells AltiVec SPU to shunt from shared memory to local SPU memory?  Do you know the PPC instruction set? Do you know distributed/network programming paradigm for the Cell? What about the XBox 360? Why bog down its PowerPC with all of the coherency traffic generated from the crappy TM?

     

    You want to get into game engines? I suggest posting something overing comp.programming.threads and/or comp.arch. BTW, have you ever programmer the IBM Cell Processor or have you done any lock-free synchronization algorithms that use the PPC instruction set? Do you know the AltiVec instruction set? I have experience with all of these. I would not want you on the design team if alls you have to say to synchronization issues in the game engine is STM… No way no how!

     

    How skilled are you wrt high-end game engine synchronization issues? They are basically the same issues that exist in the fast-pathed part of a database server… Anyway, please follow-ups in comp.programming.threads and/or comp.arch. This forum seems to get too crowed...

     

    Humm…

  • cristom wrote:
    

    One problem with implementing software transactional memory is that it's possible for a transaction to read inconsistent state (that is, to read a mixture of old and new values written by another transaction). Such a transaction is doomed to abort if it ever tries to commit, but it's possible for inconsistent state to cause a transaction to trigger a fatal exceptional condition such as a segmentation fault or even enter an endless loop, as in the following contrived example from Figure 4 of "Language Support for Lightweight Transactions":

    atomic {
        if (x != y) 
            while (true) { }
    }
    

    Transaction A

    atomic {
        x++;
        y++;
    }
    

    Transaction B

    Provided x=y initially, neither transaction above alters this invariant, but it's possible transaction A will read x after transaction B updates it but read y before transaction B updates it, causing it to enter an infinite loop. The usual strategy for dealing with this is to intercept any fatal exceptions and periodically check if each transaction is valid; if not, it can be aborted immediately, since it's doomed to fail in any case.



    Again, if you validate x and y before computing the if statement in A, it can't go wrong. This is not an unavoidable problem.
  • You use the splendid "you must just be stupid" argument, which is just great. So game engines are not difficult to parallize if you're smart enough, thus using any technology which would make this easier is an admission of stupidity. Well that's just great. Have fun with that.

    You do know that the founder and technical director (Tim Sweeney)of Epic games (recently released Gears of War for the XBox 360) is a heavy advocate for purely functional programming and STM for games, right? If not, check out "The Next Mainstream Programing Language, A Game Developer's Perspective". But I'm sure you know more than both him and me.

    Also, I'm just going to stop arguing with you right here. I dislike your argumentation style, and you don't seem interested in learning anything new. Though I do think it's funny that you pretend to be an expert on game engines, and then proceed to share your wisdom on the subject to educate poor me in case I ever want to work on games (for the record, I actually work on game engine technology for a living, for the XBox 360).

  • Why would you use STM in a game engine? And how would use use STM on the Cell? Perhaps I, or somebody else at comp.programming.theads can help you get a MUCH better design for any engine you may be working on.

    

     

     

    "You do know that the founder and technical director (Tim Sweeney)of Epic games (recently released Gears of War for the XBox 360) is a heavy advocate for purely functional programming and STM for games, right? If not, check out "The Next Mainstream Programing Language, A Game Developer's Perspective". But I'm sure you know more than both him and me."

     

    Yeah. Well, if Tim says so... I happen to completely disagree with him. BTW... Sadly, a lot of game programmers are not talented multi-threaded programmers:

     

    http://groups.google.com/group/comp.programming.threads/msg/2c477272953fc5e0

     

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/6fe2dd51631afee5

     

    http://groups.google.com/group/comp.arch/search?hl=en&group=comp.arch&q=games+multi-core&qt_g=1

     

     

    That’s the sad truth. STM is not going to help them learn how to leverage lock-free programming against their very high-end game engines synchronization needs. I bet Tim doesn’t let his programmers use any Assembly Language either? I doubt it. I also doubt that he would choose a STM based engine, over a lock-free engine that outperforms it by an order of magnitude. He would have to be crazy to do something like that.

     

    IMHO, game programming is all about squeezing all of the horsepower you can out of the architecture. STM prevents this because it will be flooding the cache coherency system with boat loads of unneeded traffic. Humm… Assembly language and game engines… Do they go together? Yes. Big time. Its going to be interesting to see how the gaming industry programs the Cell with STM… Wow. Its basically impossible. How to address the SPU with STM?

     

     

    As for me not wanting to learn anything new… Well you wrong. I implemented STM. I know all of the issues. I have to make sure that I let everybody know what’s up, and let them get mislead by the marketing for STM.

  • "Again, if you validate x and y before computing the if statement in A, it can't go wrong. This is not an unavoidable problem."

     

    Okay, well that makes STM doubly expensive. You have to see if you can commit, before you try to commit? What if your validation contains thousands of read transactions? How to address very large shared linked data-structures in STM without livelock and explicit contention management?

  • “As for me not wanting to learn anything new… Well you wrong. I implemented STM. I know all of the issues. I have to make sure that I let everybody know what’s up, and let them get mislead by the marketing for STM."

     

    ACK...

     

    I meant:

     

    As for me not wanting to learn anything new… Well you wrong. I actually implemented several versions of STM; I know all of the issues. I have to make sure that I let everybody know what’s up, and NOT let them get mislead by the marketing for STM.

  • For the record, I did not call you stupid. I want to know how you would address the Cell SPU with STM, that all...

    :^)
  • Can you use STM to efficently solve this problem:

    http://appcore.home.comcast.net/vzdoc/atomic/static-init/

    Any thoughts?
  • I almost forgot that the video mentions that they have to use Garbage Collector. I have STM that does not depend on a GC at all. Funny… I wonder why Simon and Tim depend on a GC? Its clearly not needed… A lightweight PDR scheme can implement STM…

  • Here they are the introductory papers I used for my vZOOM coolthreads project...

     

    http://appcore.home.comcast.net/vzoom/

     

    http://appcore.home.comcast.net/vzoom/round-1.pdf

     

    http://appcore.home.comcast.net/vzoom/round-2.pdf

     

    http://appcore.home.comcast.net/vzdoc/atomic/static-init/

     

     

    This is a major alternative to STM. It outperforms it by orders of magnitude. The video should have mentioned other ways of doing things.

     

    Any thoughts?

     

     

    -- P.S. this is in regard to the SUN CoolThreads contest:

     

    https://coolthreads.dev.java.net/

     

  • DAMN! This fourm software mess my post again!!!

    ArGH! Let be try one more time!

    Here they are the introductory papers I used for my vZOOM coolthreads project...

     

    http://appcore.home.comcast.net/vzoom/

     

    http://appcore.home.comcast.net/vzoom/round-1.pdf

     

    http://appcore.home.comcast.net/vzoom/round-2.pdf

     

    http://appcore.home.comcast.net/vzdoc/atomic/static-init/

     

     

    This is a major alternative to STM. It outperforms it by orders of magnitude. The video should have mentioned other ways of doing things.

     

    Any thoughts?

     

     

    -- P.S. this is in regard to the SUN CoolThreads contest:

     

    https://coolthreads.dev.java.net/

     

     

  • How is atomicity of transactions defined?  Basically you define atomicity with respect to other things.  Write transations are obviously atomic w.r.t other write transactions, but what about reads?  With single reads obviously you can't tell but what if you were traversing a linked list that had concurrent write updates being applied to it for example?  On such a traversal, would you see all or none of any write transaction, or would you see parts of write transactions depending on which nodes in the list got updated and where you were in the traversal when the updates happened?  Or are there read transactions as well?  So before you traverse through the list, you start a read transaction so you get a consistent version of the list.  And do you have retries on read transactions or do you use the memory management scheme to keep things consistent?

    I had done some investigation is this area myself when I was looking on a couple of occasions at what it would take to make TDB (Trivial DataBase), which is used by Samba, lock-free using RCU for preemptive user threads (or something very much like it).  On the second occasion I got around to looking at the actual code and realized that TDB had transactions.   So I had to figure out how to do read lock-free transactions.  Ultimately, too much work was involved for not very well defined benefits.  But bascially when you do this kind of stuff (lock-free), you become aware of the importance of well defined semantics.
  • Aviad Ezraaviade aviade

    When STM detects a long transaction that conflict - it immediately switches to more pessimistic approach (which works a lot like reader/writer locks).  

     

    For such cases STM employ the use of a contention manager (CM) that detect conflicts and decides, according to a pre-defined policy, which read-mode technique to use. By default, when transaction starts, CM attempts to use the optimistic read technique, once it detects a conflict, it considers switching to a more pessimistic approach (using reader/writer locks) depending on the transaction size (defined by the reads count) and the re-execution count.

     

    As you mentioned, a naive optimistic read technique with automatic execution schema can lead to starvation in cases where a large transaction conflict with smaller transactions that execute periodically.

  • mikemike

    From operating systems to multimedia, PC & mobile games to anti-virus, from drivers to registry cleaners and internet tools our website features all the latest soft wares for safe and free downloading enjoy.

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.