On Mon, Apr 4, 2011 at 10:33 PM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: > .... > Totally agree; I should have tried to do it this way in the first > place. However, compiling the fixed-length 0 to 5 loop does not > produce fully-unrolled assembly for me with CFLAGS="-march=native > -mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop > only, and even worse is the (lack of) performance (each is the mode of > 3 runs). Compilers are stupid apparently. > .... Can you try -O3? Or an explicit '-funroll-loops'? gcc I think does not do aggressive speed optimizations at the cost of space when at O2. Cheers Antriksh -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html