Good questions! I tried these changes out, and the extra division optimisation improves the performance of the final solution by around 1-2%. On the other hand, applying the second optimisation without the first only improves the performance of the brute
force solution by around 20%, whereas the use of both techniques gives a speedup of around 100 times.
Current implementations of fusion in compilers focus on completely eliminating intermediate data structures, whereas in this example the idea is to prune a data structure. So I would be surprised if existing compilers were able to perform this fusion step