On Mon, Jan 22, 2018 at 2:04 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > However, I suspect it actually has a slightly higher register > pressure, since you'd need to have that zero register (zero being the > "safe" value), and the only good way to get a zero value is the xor > thing, which affects flags and thus needs to be before the cmp. > > In contrast, the sbb trick has no early inputs needed. On the flipside, sbb carries a false dependency [*] on the destination register. Imagine something like divq %rcx ... cmpq %rdi, %rsi sbbq %rax,%rax sbb needs to wait not only for the comparison, but also for the earlier unrelated slow division. On the other hand, zeroing with mov or xor breaks dependencies on the destination register, making the computation independent of what came before. [*] Recent AMD chips are smart enough to understand the sbb r,r idiom and ignore the value of r, but as far as I know none of the Intel chips do.