On Wed, 11 Nov 2009, Ralf Baechle wrote: > 32-bit with -mlong-call: > > lui $25, %hi(foo) > addiu $25, %lo(foo) > jalr $25 [...] > It's time that we get a -G optimization that works for the kernel; it would > allow to cut down the -mlong-calls calling sequence to just: > > lw/ld $25, offset($gp) > jalr $25 Actually this may be no faster than the above. The load produces its result late and the jump needs its data early, so unless a bypass has been implemented in the pipeline, it may well stall for the extra cycle (that's the reason for the load-delay slot in the original MIPS I ISA after all). Of course there is still the benefit of a reduced cache footprint, but the extra load may have to evict a cache line and flush the benefit down the drain. I don't mean it's not to be considered, but it's not at all immediately obvious it would be a win. Maciej