murphee murphee

Niner since 2005


  • Windows, Part I - Dave Probert

    > Actually I've never even heard of "ThreadLevelParallelism".

    In the articles you link it's acronym TLP is used.  It just means that the CPU wants to extract parallelism from threads; it's similar to ILP (InstructionLevelParallelism, where parallelism is to be found in a stream of instructions).

    > And you'd think that since I linked a bunch of articles that explained
    > SMT in detail that I would know the difference.

    Indeed, you seem to think that.

    > I see your explanation and it doesn't sound any different to mine

    It does.
    You described CMT (coarse grained multithreading), I described SMT (simultaneus.multithreading).
    CMT means having several contexts in your CPU, but only *one* is active at a time, they're switched as soon as the CPU waits for I/O.

    SMT means having 2 contexts active in your CPU at the same time.

    Again, your linked articles explain exactly that.

  • Windows, Part I - Dave Probert

    |With HT the CPU can switch instantly to another process that already |has it's context loaded and use the dead time to do work there
    |(assuming that process isn't also waiting for off-chip code/data).

    rhm, you're wrong. Dave's explanation is correct;
    You're confusing HT (which is Intels name for SMT) with other ThreadLevelParallelism methods.
    With SMT you do have two seperate threads running at the same time, they compete for the same set of execution units (ALUs, FP Units, SIMD units, braching units,...), this is what Dave meant by resources.
    HT/SMT: works like this: every clock cycle, the OutOfOrder logic of the CPU must figure out how to fill all it's execution units with the instructions coming in, so it looks at a couple of the instructions that are to be executed, figures out which ones must happen or can happen now, and which can be executed in parallel. Worst case: only one instruction can be executed, ie. one unit is used, all the others have to idle. If several instructions can be executed in parallel, then the situation is better, cause several execution units are used an parallelism (ILP) is exploited.
    HT/SMT: is just a way of improving this, by simply offering two streams of instructions (two threads) that the OutOfOrder logic can use to fill the execution units. So, if one thread has only one instruction that can be executed, the OutOfOrder logic simply looks at the second stream and chooses some instructions from there. In the ideal case, this looks, for instance, like this: Thread 1 needs one ALU unit, Thread 2 needs an ALU Unit and an FP Unit, ,...

    The problems with this approach are: if both threads need all the ALU units they can get, then they obviously can't run at the same time.

    What you're thinking of (the CPU internally switching threads on a data access or cache miss) is talked about in this article: