On May 10, 2020 4:59:17 AM PDT, David Laight <David.Laight@xxxxxxxxxx> wrote: >From: Peter Anvin >> Sent: 08 May 2020 18:32 >> On 2020-05-08 10:21, Nick Desaulniers wrote: >> >> >> >> One last suggestion. Add the "b" modifier to the mask operand: >"orb >> >> %b1, %0". That forces the compiler to use the 8-bit register name >> >> instead of trying to deduce the width from the input. >> > >> > Ah right: >https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers >> > >> > Looks like that works for both compilers. In that case, we can >likely >> > drop the `& 0xff`, too. Let me play with that, then I'll hopefully >> > send a v3 today. >> > >> >> Good idea. I requested a while ago that they document these >modifiers; they >> chose not to document them all which in some ways is good; it shows >what they >> are willing to commit to indefinitely. > >I thought the intention here was to explicitly do a byte access. >If the constant bit number has had a div/mod by 8 done on it then >the address can be misaligned - so you mustn't do a non-byte sized >locked access. > >OTOH the original base address must be aligned. > >Looking at some instruction timing, BTS/BTR aren't too bad if the >bit number is a constant. But are 6 or 7 clocks slower if it is in %cl. >Given these are locked RMW bus cycles they'll always be slow! > >How about an asm multi-part alternative that uses a byte offset >and byte constant if the compiler thinks the mask is constant >or a 4-byte offset and 32bit mask if it doesn't. > >The other alternative is to just use BTS/BTS and (maybe) rely on the >assembler to add in the word offset to the base address. > > David > >- >Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, >MK1 1PT, UK >Registration No: 1397386 (Wales) I don't understand what you are getting at here. The intent is to do a byte access. The "multi-part asm" you are talking about is also already there... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.