I think you're missing the point. UMS doesn't get switched off the thread by the kernel. Rather it get switched around in usermode by the UMS "scheduler" during the cooperative multitasking of UMS. UMS threads do not get descheduled by the kernel, but rather are descheduled cooperatively via the UmsThreadYield call.
In UMS, the registers of a UMS thread are mapped to processor registers when the UMS fiber is on the processor and mapped to the thread context's registers when the thread itself has been dequeued and the fiber is active. Non active UMS threads have their processor state mapped into the UMS fiber context until they are next rescheduled.
Similarly UMS stacks are swapped out by "pivoting" the stack to the currently scheduled UMS fiber.
Stuff like TLS and so on only work because they have been specifically coded to inspect and respond correctly to UMS tasking. This means that when you request a TLS slot via the kernel32 function, you're actually requesting the TLS slot for your current UMS task on the current thread.
The kernel makes no distinction (well, almost) between different UMS fiber. A UI request on one UMS fiber makes the entire thread (not just the fiber) a Win32k UI thread. Similarly a call to something like NtTerminateThread from within one UMS fiber implicitly kills all of the other UMS fibers on that thread.
Very few system calls explicitly play nicely with UMS. This is one of the many reasons why UMS threads are strongly discouraged from making many system calls.
What I'm ultimately getting at, is that the kernel's scheduler doesn't schedule UMS fibers. It schedules threads. If those threads want to manage UMS, they do so themselves cooperatively entirely from within usermode. The fact that it sometimes appears as if the kernel has changed your thread (e.g. via TLS slots being different per UMS fiber) is usually due to the fact that some system APIs are kind enough to give you that impression.
Syncronisation primitives are a rare example of where the system does care about UMS. In this case, whenever a blocking call occurs in the usermode thread, the system calls back into the UMS in order to allow the usermode scheduler to choose a new fiber to run. The callback is defined by an earlier call to EnterUmsSchedulingMode.