On Tue, Jan 9, 2018 at 4:48 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > I looks like there is another problem, or I'm misreading the > cleverness. I think you were misreading it. I was basically saying that this: unsigned long _mask = ~(long)(_m - 1 - _i) >> BITS_PER_LONG - 1;\ doesn't work, and that the "_m -1 - _i" needs to be replaced by "_i | _m -1 -_i". So you have unsigned long _mask = ~(long)(_i (_m - 1 - _i)) >> BITS_PER_LONG - 1;\ which should give the right result. No? But as mentioned, I think you can do it with two instructions if you do an architecture-specific inline asm: unsigned long mask; asm("cmpq %1,%2; sbbq %0,%0" :"=r" (mask) :"g" (max),"r" (idx)); which is likely much faster, and has much better register usage ("max" can be a constant or loaded directly from memory, and "mask" could be assigned the same register as idx). But once again, I didn't really test it. Note that the "cmpq/sbbq" version works regardless of max/idx values, since it literally does the math in BITS_ION_LONG+1 bits. In contrast, the "binary or with idx" version only works if the high bit set in idx cannot be valid (put another way: 'max' must not be bigger than MAXLONG+1). Linus