So it is indeed doing dynamic linking. Maybe passing shared objects to gcc implicitly tells it to link them dynamically, but I've never seen it done before (I usually see -l<lib>).
Since the running time seems to grow with the size of the input, my only guess is that threading is being (ab)used in a way that pthreads handle more gracefully than Windows threading (assuming pthreads-win32 is a wrapper for Windows threads). I would attempt
to run to core calculations of the algorithm without any threading involved and compare the numbers.
Those numbers are without any threading involved. The original ones for WIndows were with threading involved.