On Mon, Jan 8, 2018 at 8:13 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Jan 8, 2018 at 7:42 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > originally from Linus and tweaked by Alexei and I: > > Sadly, that tweak - while clever - is wrong. > > > unsigned long _mask = ~(long)(_m - 1 - _i) >> BITS_PER_LONG - 1;\ > > Why? > > Because "(long)(_m-1-_i)" is not negative just because "i >= m". It > can still be positive. > > Think "m = 100", "i=bignum". The subtraction will overflow and become > positive again, the shift will shift to zero, and then the mask will > become ~0. > > Now, you can fix it, but you need to be a tiny bit more clever. In > particular, just make sure that you retain the high bit of "_i", > basically making the rule be that a negative index is not ever valid. > > And then instead of "(_m - 1 - _i)", you use "(_i | (_m - 1 - _i))". > Now the sign bit is set if _i had it set, _or_ if the subtraction > turned negative, and you don't have to worry about the overflow > situation. > > But it does require that extra step to be trustworthy. Still purely > cheap arithmetic operations, although there is possibly some > additional register pressure there. > > Somebody might be able to come up with something even more minimal (or > find a fault in my fix of your tweak). I looks like there is another problem, or I'm misreading the cleverness. We want the mask to be ~0 in the ok case and 0 in the out-of-bounds case. As far as I can see we end up with ~0 in the ok case, and ~1 in the bad case. Don't we need to do something like the following, at which point are we getting out of the realm of "cheap ALU instructions"? #define __nospec_array_ptr(base, idx, sz) \ ({ \ union { typeof(&base[0]) _ptr; unsigned long _bit; } __u; \ unsigned long _i = (idx); \ unsigned long _s = (sz); \ unsigned long _v = (long)(_i | _s - 1 - _i) \ >> BITS_PER_LONG - 1; \ unsigned long _mask = _v * ~0UL; \ OPTIMIZER_HIDE_VAR(_mask); \ __u._ptr = &base[_i & _mask]; \ __u._bit &= _mask; \ __u._ptr; \ }) elem = nospec_array_ptr(array, idx, 3); 106: b8 02 00 00 00 mov $0x2,%eax 10b: 48 63 ff movslq %edi,%rdi 10e: 48 29 f8 sub %rdi,%rax 111: 48 09 f8 or %rdi,%rax 114: 48 c1 e8 3f shr $0x3f,%rax 118: 48 21 c7 and %rax,%rdi 11b: 48 8d 54 bc 04 lea 0x4(%rsp,%rdi,4),%rdx 120: 48 21 d0 and %rdx,%rax