2006/1/26, Nigel Stephens <nigel@xxxxxxxx>: > > Then you'll have to have a look at the resulting disassembled code and > figure what's changed. :) > > Thinking about this in more detail: > > 1) Using -march=4ksd reduces the cost of a multiply by 1 instruction > (from 5 to 4 cycles), so a few more constant multiplications, previously > expanded into a sequence of shifts, adds and subs, may now be replaced > by a shorter sequence of "li" and "mul" instructions. > Is it really specific to 4ksd cpu ? Could this behaviour be triggered by other options ? > 2) Enabling branch-likely may allow some instructions to be moved into a > branch delay slot which previously couldn't be -- but usually these are > duplicates of the code at the original branch target, so have little > effect on overall code size. > > 3) Using -march=mips32r2 with -O1 and above (but not -Os) enables 64-bit > alignment of functions and frequently-used branch targets (e.g. loop > headers); whereas -march=4ksc will not do that. This will add some > additional "nops" to the code. > I noticed your last point when staring at the disassembled code. And it seems to be ack by these figures: text data bss dec hex filename 2099642 110784 81956 2292382 22fa9e vmlinux-4ksd 2136269 110784 81956 2329009 2389b1 vmlinux-mips32r2 1953086 110784 81956 2145826 20be22 vmlinux-4ksd-Os 1954489 110784 81956 2147229 20c39d vmlinux-mips32r2-Os I now have to check that your first and second points don't have too much bad impact on the overall speed although I don't know how to measure that...But if so, I could safely use -march=mips32r2 -Os options. Thanks -- Franck