Le dimanche 18 mars 2012 à 19:35 +1100, Indan Zupancic a écrit : > Yes. The main difference would be that the JIT could always generate imm8 > offsets, saving 4 bytes per long offset while also simplifying the compiler > code. The %rdi + 127 is 4 bytes, and if the rest become slightly faster > because they're imm8 then it's worth the extra instruction. > Do you understand you try to save 3 bytes in the function prolog, but your single EMIT4(0x48, 0x83, 0xc7, 127); /* addq $127,%rdi */ defeats this ? Ancillary instructions are rarely used, libpcap for example doesnt have support for them. > I first thought the +127 could be done in two bytes, but 4 bytes are > needed, so maybe it's not worth it. ... > The add 127 would be at the start, the first instruction using it would > be a couple of instructions later, so I don't think the dependency is a > problem. > > You're right about skb_copy_bits(), I did a quick search for rdi usage > but missed it was the first parameter too. It would need one extra > sub 127 or add -127 in the slow path, after the push. But it's the slow > path already, one extra instruction won't make much difference. It will, because new NIC drivers tend to provide skbs with fragments. Using libpcap filter like "udp[100]" calls the skb_copy_bits() helper in this case. There is no difference in instruction timing using offset32 or offset8, so the code you add will slow the filter anyway. Please dont obfuscate this code, I'd like to keep it maintainable. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html