On Sat, Oct 21, 2023 at 12:55:45AM +0100, Andrew Cooper wrote: > On 20/10/2023 9:44 pm, Pawan Gupta wrote: > > +#define EXEC_VERW \ > > + __EXEC_VERW(551f); \ > > + /* nopl __KERNEL_DS(%rax) */ \ > > + .byte 0x0f, 0x1f, 0x80, 0x00, 0x00; \ > > +551: .word __KERNEL_DS; \ > > Is this actually wise from a perf point of view? > > You're causing a data access to the instruction stream, and not only > that, the immediate next instruction. Some parts don't take kindly to > snoops hitting L1I. I suspected the same and asked CPU architects, they did not anticipate reads being interpreted as part of self modifying code. The perf numbers do not indicate a problem, but they dont speak for all the parts. It could be an issue with some parts. > A better option would be to simply have > > .section .text.entry > .align CACHELINE > mds_verw_sel: > .word __KERNEL_DS > int3 > .align CACHELINE > > > And then just have EXEC_VERW be > > verw mds_verw_sel(%rip) > > in the fastpaths. That keeps the memory operand in .text.entry it works > on Meltdown-vulnerable CPUs, but creates effectively a data cacheline > that isn't mixed into anywhere in the frontend, which also gets far > better locality of reference rather than having it duplicated in 9 > different places. > Also it avoids playing games with hiding data inside an instruction. > It's a neat trick, but the neater trick is avoid it whenever possible. Thanks for the pointers. I think verw in 32-bit mode won't be able to address the operand outside of 4GB range. Maybe this is fine or could it be a problem addressing from e.g. KVM module?