* Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > * Eric Biggers <ebiggers3@xxxxxxxxx> wrote: > > > There may be a small overhead caused by replacing 'xchg REG, REG' with > > the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per > > round. But, counterintuitively, when I tested "ctr-twofish-3way" on a > > Haswell processor, the new version was actually about 2% faster. > > (Perhaps 'xchg' is not as well optimized as plain moves.) > > XCHG has implicit LOCK semantics on all x86 CPUs, so that's not a surprising > result I think. Correction: I think XCHG only implies LOCK if there's a memory operand involved - register-register XCHG should not imply any barriers. So the result is indeed unintuitive. Thanks, Ingo