I believe this fixes a bug introduced in the following KAISER patch: x86/mm/kaiser: Use PCID feature to make user and kernel switches faster It's only been lightly tested. I'm sharing so that folks who might be running into it have a fix to test. -- From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> There have been a series of weird warnings and boot problems on when running the KAISER PCID patches. I believe many of them can be tracked down to this problem. One example: http://lkml.kernel.org/r/5a1aaa36.CWNgvwmmRFzeAlPc%fengguang.wu@xxxxxxxxx The issue is when we are relatively early in boot and have the lower 12 bits of CR3 clear and thus are running with PCID (aka ASID) 0. cpu_tlbstate.loaded_mm_asid contains a 0. *But* PCID 0 is not ASID 0. The ASIDs are biased up by one as not to conflict with the somewhat special hardware PCID 0. Upon entering __native_flush_tlb_single(), we set loaded_mm_asid=0. We then calculate the kern_asid(), biasing up by 1, get 1, and pass *that* to INVPCID. Thus, we have PCID 0 loaded in CR3 but are flushing PCID 1 with INVPCID. That obviously does not work. To fix this, mark the cpu_tlbstate.loaded_mm_asid as invalid, then detect that state in __native_flush_tlb_single(), falling back to INVLPG. Also add a VM_WARN_ON() to help find these in the future. Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Reported-by: fengguang.wu@xxxxxxxxx Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Richard Fellner <richard.fellner@xxxxxxxxxxxxxxxxx> Cc: Moritz Lipp <moritz.lipp@xxxxxxxxxxxxxx> Cc: Daniel Gruss <daniel.gruss@xxxxxxxxxxxxxx> Cc: Michael Schwarz <michael.schwarz@xxxxxxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: x86@xxxxxxxxxx --- b/arch/x86/include/asm/tlbflush.h | 36 +++++++++++++++++++++++++++++++----- b/arch/x86/mm/init.c | 1 + 2 files changed, 32 insertions(+), 5 deletions(-) diff -puN arch/x86/include/asm/tlbflush.h~kaiser-fix-wrong-asid-flush arch/x86/include/asm/tlbflush.h --- a/arch/x86/include/asm/tlbflush.h~kaiser-fix-wrong-asid-flush 2017-11-28 01:43:05.180452966 -0800 +++ b/arch/x86/include/asm/tlbflush.h 2017-11-28 01:43:05.190452966 -0800 @@ -77,6 +77,8 @@ static inline u64 inc_mm_tlb_gen(struct /* There are 12 bits of space for ASIDS in CR3 */ #define CR3_HW_ASID_BITS 12 +#define CR3_NR_HW_ASIDS (1<<CR3_HW_ASID_BITS) +#define INVALID_HW_ASID (CR3_NR_HW_ASIDS+1) /* When enabled, KAISER consumes a single bit for user/kernel switches */ #ifdef CONFIG_KAISER #define X86_CR3_KAISER_SWITCH_BIT 11 @@ -425,19 +427,40 @@ static inline void __native_flush_tlb_gl raw_local_irq_restore(flags); } +static inline void __invlpg(unsigned long addr) +{ + asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); +} + +static inline u16 cr3_asid(void) +{ + return __read_cr3() & ((1<<CR3_HW_ASID_BITS)-1); +} + static inline void __native_flush_tlb_single(unsigned long addr) { - u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + u16 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); /* - * Some platforms #GP if we call invpcid(type=1/2) before - * CR4.PCIDE=1. Just call invpcid in the case we are called - * early. + * Handle systems that do not support PCIDs. This will also + * get used in cases where this is called before PCID detection + * is done. */ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { - asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); + __invlpg(addr); return; } + + /* + * An "invalid" loaded_mm_asid means that we have not + * initialized 'cpu_tlbstate' and are not using PCIDs. + * Just flush the TLB as if PCIDs were not present. + */ + if (loaded_mm_asid == INVALID_HW_ASID) { + __invlpg(addr); + return; + } + /* Flush the address out of both PCIDs. */ /* * An optimization here might be to determine addresses @@ -451,6 +474,9 @@ static inline void __native_flush_tlb_si if (kern_asid(loaded_mm_asid) != user_asid(loaded_mm_asid)) invpcid_flush_one(user_asid(loaded_mm_asid), addr); invpcid_flush_one(kern_asid(loaded_mm_asid), addr); + + /* Check that we are flushing the active ASID: */ + VM_WARN_ON_ONCE(kern_asid(loaded_mm_asid) != cr3_asid()); } static inline void __flush_tlb_all(void) diff -puN arch/x86/mm/init.c~kaiser-fix-wrong-asid-flush arch/x86/mm/init.c --- a/arch/x86/mm/init.c~kaiser-fix-wrong-asid-flush 2017-11-28 01:43:05.186452966 -0800 +++ b/arch/x86/mm/init.c 2017-11-28 01:43:05.190452966 -0800 @@ -882,6 +882,7 @@ void __init zone_sizes_init(void) DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { .loaded_mm = &init_mm, + .loaded_mm_asid = INVALID_HW_ASID, /* We are not doing ASID management yet */ .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ }; _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>