From: Linus Torvalds > Sent: 30 December 2023 20:59 > > On Sat, 30 Dec 2023 at 12:41, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > UNTESTED patch to just do the "this_cpu_write()" parts attached. > > Again, note how we do end up doing that this_cpu_ptr conversion later > > anyway, but at least it's off the critical path. > > Also note that while 'this_cpu_ptr()' doesn't exactly generate lovely > code, it really is still better than caching a value in memory. > > At least the memory location that 'this_cpu_ptr()' accesses is > slightly more likely to be hot (and is right next to the cpu number, > iirc). I was only going to access the 'self' field in code that required the 'node' cache line be present. > > That said, I think we should fix this_cpu_ptr() to not ever generate > that disgusting cltq just because the cpu pointer has the wrong > signedness. I don't quite know how to do it, but this: > > -#define per_cpu_offset(x) (__per_cpu_offset[x]) > +#define per_cpu_offset(x) (__per_cpu_offset[(unsigned)(x)]) > > at least helps a *bit*. It gets rid of the cltq, at least, but if > somebody actually passes in an 'unsigned long' cpuid, it would cause > an unnecessary truncation. Doing the conversion using arithmetic might help, so: __per_cpu_offset[(x) + 0u] > And gcc still generates > > subl $1, %eax #, cpu_nr > addq __per_cpu_offset(,%rax,8), %rcx > > instead of just doing > > addq __per_cpu_offset-8(,%rax,8), %rcx > > because it still needs to clear the upper 32 bits and doesn't know > that the 'xchg()' already did that. Not only that, you need to do the 'subl' after converting to 64 bits. Otherwise the wrong location is read were cpu_nr to be zero. I've tried that - but it still failed. > Oh well. I guess even without the -1/+1 games by the OSQ code, we > would still end up with a "movl" just to do that upper bits clearing > that the compiler doesn't know is unnecessary. > > I don't think we have any reasonable way to tell the compiler that the > register output of our xchg() inline asm has the upper 32 bits clear. It could be done for a 32bit unsigned xchg() - just make the return type unsigned 64bit. But that won't work for the signed exchange - and 'atomic_t' is signed. OTOH I'd guess this code could use 'unsigned int' instead of atomic_t? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)