Irrelevant to your question, but that blog post is interesting in the sense I did not know Julia can do that. :) I just tried code_llvm and code_native and they came back virtually instantly. I don't think the author meant the operation actually takes a computer 20 seconds. More like it takes 20 seconds of a human's time to type in the code to accomplish it.
Relevant to your question, taking the compiler assembler output as a pedagogical tool for assembler is a horrible idea. The assembler output is not meant to make intuitive sense. Compilers make decisions that only make sense to a human who understands intimate details of how microprocessors work and often worse, how a specific microprocessor work (eg. there are differences between Sandy Bridge and Haswell even tho they are both x86 processors). The way you order an instruction could have a profound a effect on the pipeline, cache, and branch predictor. Decisions that a compiler may make that is not obviously followable from the compiler's output.
Look at it this way, assembler is just instructions to like drive a car, but compilers use those instructions with hidden knowledge of the car's internals to get the best mileage, so they might make decisions that look convoluted to someone who just knows how to drive a car.