On Tue, 12 May 2009, David Daney wrote: > > > + /* > > > + * Find the split point. > > > + */ > > > + if (uasm_insn_has_bdelay(relocs, split - 1)) > > > + split--; > > > + } > > > > The code itself makes sense. Does this case actually happen much, or was > > this just an itch? > > > > For my CPU it was happening 100% of the time when I add the soon to be > submitted hugeTLBfs support patch. Although I have not measured it, this code > is so hot that keeping the normal case fitting on a single cache line should > be a big win. Rather than this hack, I'd suggest microoptimising the code by shuffling it such that unless the handler fits in 128 bytes entirely (I'm not sure if that ever happens for XTLB refill) the part built by build_get_pgd_vmalloc64() is placed in the TLB handler slot, saving an unnecessary unconditional branch there. This way the problem of an unconditional branch to ERET will solve automagically as a side-effect. Unless the vmalloc part does not fit in 128 bytes, that is, in which case it would have to overflow back to the XTLB slot. It should be pretty straightforward to code. ;) Maciej