From: Peter Anvin > Sent: 08 May 2020 18:32 > On 2020-05-08 10:21, Nick Desaulniers wrote: > >> > >> One last suggestion. Add the "b" modifier to the mask operand: "orb > >> %b1, %0". That forces the compiler to use the 8-bit register name > >> instead of trying to deduce the width from the input. > > > > Ah right: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers > > > > Looks like that works for both compilers. In that case, we can likely > > drop the `& 0xff`, too. Let me play with that, then I'll hopefully > > send a v3 today. > > > > Good idea. I requested a while ago that they document these modifiers; they > chose not to document them all which in some ways is good; it shows what they > are willing to commit to indefinitely. I thought the intention here was to explicitly do a byte access. If the constant bit number has had a div/mod by 8 done on it then the address can be misaligned - so you mustn't do a non-byte sized locked access. OTOH the original base address must be aligned. Looking at some instruction timing, BTS/BTR aren't too bad if the bit number is a constant. But are 6 or 7 clocks slower if it is in %cl. Given these are locked RMW bus cycles they'll always be slow! How about an asm multi-part alternative that uses a byte offset and byte constant if the compiler thinks the mask is constant or a 4-byte offset and 32bit mask if it doesn't. The other alternative is to just use BTS/BTS and (maybe) rely on the assembler to add in the word offset to the base address. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)