From: riel@xxxxxxxxxxx <riel@xxxxxxxxxxx> Sent: Sunday, January 12, 2025 7:54 AM > > Add invlpgb.h with the helper functions and definitions needed to use > broadcast TLB invalidation on AMD EPYC 3 and newer CPUs. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > --- > arch/x86/include/asm/invlpgb.h | 95 +++++++++++++++++++++++++++++++++ > arch/x86/include/asm/tlbflush.h | 1 + > 2 files changed, 96 insertions(+) > create mode 100644 arch/x86/include/asm/invlpgb.h > > diff --git a/arch/x86/include/asm/invlpgb.h b/arch/x86/include/asm/invlpgb.h > new file mode 100644 > index 000000000000..d62e3733a1ab > --- /dev/null > +++ b/arch/x86/include/asm/invlpgb.h > @@ -0,0 +1,95 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_X86_INVLPGB > +#define _ASM_X86_INVLPGB > + > +#include <vdso/bits.h> > + > +/* > + * INVLPGB does broadcast TLB invalidation across all the CPUs in the system. > + * > + * The INVLPGB instruction is weakly ordered, and a batch of invalidations can > + * be done in a parallel fashion. > + * > + * TLBSYNC is used to ensure that pending INVLPGB invalidations initiated from > + * this CPU have completed. > + */ > +static inline void __invlpgb(unsigned long asid, unsigned long pcid, unsigned long addr, > + int extra_count, bool pmd_stride, unsigned long flags) > +{ > + u32 edx = (pcid << 16) | asid; > + u32 ecx = (pmd_stride << 31); > + u64 rax = addr | flags; > + > + /* Protect against negative numbers. */ > + extra_count = max(extra_count, 0); > + ecx |= extra_count; > + > + asm volatile("invlpgb" : : "a" (rax), "c" (ecx), "d" (edx)); The above needs to be: asm volatile(".byte 0x0f, 0x01, 0xfe" : : "a" (rax), "c" (ecx), "d" (edx)); plus an explanatory comment. As Boris Petkov previously noted[1], the "invlpgb" instruction name requires binutils version 2.36. But the current Linux kernel minimum binutils version is 2.25 (in scripts/min-tool-version.sh). For example, I'm using binutils 2.34, and your asm statement doesn't compile. > +} > + > +/* Wait for INVLPGB originated by this CPU to complete. */ > +static inline void tlbsync(void) > +{ > + asm volatile("tlbsync"); Same as above for "tlbsync". Michael [1] https://lore.kernel.org/lkml/20250102124247.GPZ3aJx8JTJa6PcaOW@fat_crate.local/