>>> On 15.12.11 at 23:52, Christoph Lameter <cl@xxxxxxxxx> wrote: > On Thu, 15 Dec 2011, tip-bot for Jan Beulich wrote: >> The __dummy variable was pointless (and needlessly initialized >> in the 2x32-bit case), given that local copies of the inputs >> already exist. > > Hmm... I had some failures if I did not specify that dummy in the > inline asm. Does this work for all gcc versions? You need to have the output go somewhere, but using the already existing __o2 for this purpose is possible and sufficient. >> The 2x64-bit variant forced the address of the first object into >> %rsi, even though this is needed only for the call to the >> emulation function. The real cmpxchg16b can operate on an >> memory. > > Yup. Good idea to code the load into the alternative code path to avoid > the cmpxchg of the primary code path to be restricted to %si register. > > You dropped the padding with NOPs. Are the instructions on both paths > always the same length? leaq and cmpxchg16b use the same operand, so their modrm encoding (including eventual SIB and immediate) are the same. leaq is REX+opcode+modrm... cmpxchg16b (SEG+)REX+0x0f+opcode+modrm so the latter is two bytes longer than the former (one byte in UP). call being five bytes vs setz %al being 3 bytes makes it that lea+call are one byte longer than cmpxchg16b+setz in the UP case, but I think this is tolerable. If not, the operand of lea could be made %rip-relative for the case where the operand is a direct access (i.e. become %1 instead of %P1). Jan -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html