So it is indeed doing dynamic linking. Maybe passing shared objects to gcc implicitly tells it to link them dynamically, but I've never seen it done before (I usually see -l<lib>).

Since the running time seems to grow with the size of the input, my only guess is that threading is being (ab)used in a way that pthreads handle more gracefully than Windows threading (assuming pthreads-win32 is a wrapper for Windows threads). I would attempt to run to core calculations of the algorithm without any threading involved and compare the numbers. I'm not sure what you're using to do your timing, but there is a time program standard on Linux that you can use that will give you a more detailed breakdown.

time ./a.out