On Tue, Jan 9, 2018 at 4:48 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Mon, Jan 8, 2018 at 8:13 PM, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> On Mon, Jan 8, 2018 at 7:42 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: >> > >> > originally from Linus and tweaked by Alexei and I: >> >> Sadly, that tweak - while clever - is wrong. >> >> > unsigned long _mask = ~(long)(_m - 1 - _i) >> BITS_PER_LONG - 1;\ >> >> Why? >> >> Because "(long)(_m-1-_i)" is not negative just because "i >= m". It >> can still be positive. >> >> Think "m = 100", "i=bignum". The subtraction will overflow and become >> positive again, the shift will shift to zero, and then the mask will >> become ~0. >> >> Now, you can fix it, but you need to be a tiny bit more clever. In >> particular, just make sure that you retain the high bit of "_i", >> basically making the rule be that a negative index is not ever valid. >> >> And then instead of "(_m - 1 - _i)", you use "(_i | (_m - 1 - _i))". >> Now the sign bit is set if _i had it set, _or_ if the subtraction >> turned negative, and you don't have to worry about the overflow >> situation. >> >> But it does require that extra step to be trustworthy. Still purely >> cheap arithmetic operations, although there is possibly some >> additional register pressure there. >> >> Somebody might be able to come up with something even more minimal (or >> find a fault in my fix of your tweak). > > I looks like there is another problem, or I'm misreading the > cleverness. We want the mask to be ~0 in the ok case and 0 in the > out-of-bounds case. As far as I can see we end up with ~0 in the ok > case, and ~1 in the bad case. Don't we need to do something like the > following, at which point are we getting out of the realm of "cheap > ALU instructions"? > > #define __nospec_array_ptr(base, idx, sz) \ > ({ \ > union { typeof(&base[0]) _ptr; unsigned long _bit; } __u; \ > unsigned long _i = (idx); \ > unsigned long _s = (sz); \ > unsigned long _v = (long)(_i | _s - 1 - _i) \ > >> BITS_PER_LONG - 1; \ > unsigned long _mask = _v * ~0UL; \ > OPTIMIZER_HIDE_VAR(_mask); \ > __u._ptr = &base[_i & _mask]; \ > __u._bit &= _mask; \ > __u._ptr; \ > }) Sorry, I'm slow of course ~(-1L) is 0.