On Thu, Apr 04, 2019 at 04:24:15PM +0000, David Laight wrote: > From: David Laight > > Sent: 04 April 2019 15:45 > > > > From: Fenghua Yu > > > Sent: 03 April 2019 22:22 > > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to > > > operate on bitmap defined in x86_capability. > > > > > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, > > > the location is at: > > > base address of x86_capability + (bit offset in x86_capability / 64) * 8 > > > > > > Since base address of x86_capability may not be aligned to unsigned long, > > > the single unsigned long location may cross two cache lines and > > > accessing the location by locked BTS/BTR introductions will trigger #AC. > > > > That is not true. > > The BTS/BTR instructions access the memory word that contains the > > expected bit. > > The 'operand size' only affects the size of the register use for the > > bit offset. > > If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might > > do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it > > might do an aligned 32 bit access. > > It should never do an 64bit access and never a misaligned one (even if > > the base address is misaligned). > > Hmmm... I may have misread things slightly. > The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'. > However nothing suggests that it ever does 64bit accesses. > > If it does do 64bit accesses when the operand size is 64 bits then the > asm stubs ought to be changed to only specify 32bit operand size. Heh, we had this discussion before[1], the op size dictates the size of the memory access and can generate unaligned accesses. [1] https://lkml.kernel.org/r/20181127195153.GE27075@xxxxxxxxxxxxxxx