So it is indeed doing dynamic linking. Maybe passing shared objects to gcc implicitly tells it to link them dynamically, but I've never seen it done before (I usually see -l<lib>).
Since the running time seems to grow with the size of the input, my only guess is that threading is being (ab)used in a way that pthreads handle more gracefully than Windows threading (assuming pthreads-win32 is a wrapper for Windows threads). I would attempt
to run to core calculations of the algorithm without any threading involved and compare the numbers.
Those numbers are without any threading involved. The original ones for WIndows were with threading involved.
Is the processor multicore? (I didn't see it mentioned in the OP) If it isn't, threading code is just overhead and performance will be degraded on your Windows test.