On Tue, Apr 5, 2011 at 6:55 PM, Antriksh Pany <antriksh.pany@xxxxxxxxx> wrote: > On Mon, Apr 4, 2011 at 10:33 PM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: >> .... >> Totally agree; I should have tried to do it this way in the first >> place. However, compiling the fixed-length 0 to 5 loop does not >> produce fully-unrolled assembly for me with CFLAGS="-march=native >> -mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop >> only, and even worse is the (lack of) performance (each is the mode of >> 3 runs). Compilers are stupid apparently. >> .... > > Can you try -O3? Or an explicit '-funroll-loops'? > gcc I think does not do aggressive speed optimizations at the cost of > space when at O2. Sure- both of these options show the loop being unrolled for all 5 iterations. However, that doesn't help me and the other 95% of people using distro packages, git-scm.com binaries, or anyone compiling with the default CFLAGS optimization level which is unfortunate. -Dan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html