I can see how STM could have some performance implications, I mean if the atomic blocks get large ther is a lot of bookkeeping about what memory has been read and written. I can also see there is a lot of room for smart soultions in reducing the amount of bookkeping nessesary. But I love the power of the concept of transactional memory.
I'm wondering, couldn't STM have been implemented by you inserting the locks nessesary or probably better, a combination of locks where they would be smart and not deadlock and reevaluating where that would be deemed smart. I mean it's easy to think up scenarioes where there would be performance problems with reevaluating and you need a lot of good heuristics to ensure good performance in most cases. If you let the compiler insert the locks nessesary you can have it analyze for deadlocks at least in some cases and you eliminate the problem of having to keep a global order of locks that you talk about in the video.
Why did you opt for reevaluating instead of inserting locks. (guess the answer is composability)