On Mon, Aug 28, 2023 at 3:53 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > From: Linus Torvalds > > Sent: 25 August 2023 21:43 > .... > > Clang turns this: > > > > return __ffs64(val); > > > > into this horror: > > > > pushq %rax > > movq %rdi, (%rsp) > > #APP > > rep > > bsfq (%rsp), %rax > > #NO_APP > > popq %rcx > > > > which is just incredibly broken on so many levels. It *should* be a > > single instruction, like gcc does: > > > > rep; bsf %rdi,%rax # tmp87, word > > > > but clang decides that it really wants to put the argument on the > > stack, and apparently also wants to do that nonsensical stack > > alignment thing to make things even worse. > > > > We use this: > > > > static __always_inline unsigned long variable__ffs(unsigned long word) > > { > > asm("rep; bsf %1,%0" > > : "=r" (word) > > : "rm" (word)); > > return word; > > } > > > > for the definition, and it looks like clang royally just screws up > > here. Yes, "m" is _allowed_ in that input set, but it damn well > > shouldn't be used for something that is already in a register, since > > "r" is also allowed, and is the first choice. > > Why don't we just remove the "m" option? > > Pretty much the only time it will be worse is it the value > is in memory and loading it into a register causes a spill > to stack. > > While it is possible to generate code where that happens it > is pretty unlikely. As Linus expressed below, register exhaustion could occur. Besides, this is a bug in clang that we acknowledge, and should fix. I have the general idea where things are going wrong, I just don't yet have the muscle memory (or time) to dive into the register allocator. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) -- Thanks, ~Nick Desaulniers