Yet I think you glossed over the real reason people use threads. Unless an application has been designed from its inception to be asynchronous, threading does one thing that begin/end asynch does not: it preserves application state through the stack.
In particular, imagine a legacy application that is deep down in its stack of A calling B calling C calling ... and somewhere deep in the stack it needs to do an I/O.
If it can't proceed until the I/O is done there is little to do but block (which is why you need to be on a thread, otherwise the whole application freezes).
If you spawned an asynch request at this point, what is the thread supposed to do until the I/O is done? It can't just return !!
I wish you had addressed this point - in my opinion it is the key issue regarding threading.
The answer, I believe, is to have the stack be independent of threading of execution (which is after all just a way of sharing the CPU across multiple logical work requests). A call stack represents the true application state (along with the heap, ...).
In this model of an application thread, it is the stack that is important. When a 'stack' is blocked the underlying execution (managed by the system) can just jump to another 'stack'.