Bear in mind that your test also includes the allocation of a very large buffer which is very inefficient for the GC both to create initially and then every time it needs to perform a collection or when it needs to marshal data over from your managed byte to an unmanaged void* for the ReadFile() call which sits behind the FileStream's Read() method.
The buffer allocation is done outside of the timing loop. The GC also doesn't need to do any collections of the large buffer during the test since I don't create a new one at each loop.
Next, if you look at the WriteFast method, it uses that same large buffer multiple times to call FileStream.Write. This very simple change causes a very large change in my results.
In addition, a large buffer doesn't take any longer than a small buffer to marshal since there is no memory copy involved in the case of a byte buffer.
So I don't think the issues you mention plays a part in the results I'm seeing.