On Thu, 25 Mar 2021, Borislav Petkov wrote: > Ok, > > I tried to be as specific as possible in the commit message so that we > don't forget. Please lemme know if I've missed something. > > Babu, Jim, I'd appreciate it if you ran this to confirm. > > Thx. > > --- > From: Borislav Petkov <bp@xxxxxxx> > Date: Thu, 25 Mar 2021 11:02:31 +0100 > > Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel > are exploding during alternatives patching: > > kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709! > invalid opcode: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > swap_entry_free > swap_entry_free > text_poke_bp > swap_entry_free > arch_jump_label_transform > set_debug_rodata > __jump_label_update > static_key_slow_inc > frontswap_register_ops > init_zswap > init_frontswap > do_one_initcall > set_debug_rodata > kernel_init_freeable > rest_init > kernel_init > ret_from_fork > > triggering the BUG_ON in text_poke() which verifies whether patched > instruction bytes have actually landed at the destination. > > Further debugging showed that the TLB flush before that check is > insufficient because there could be global mappings left in the TLB, > leading to a stale mapping getting used. > > I say "global mappings" because the hardware configuration is a new one: > machine is an AMD, which means, KAISER/PTI doesn't need to be enabled > there, which also means there's no user/kernel pagetables split and > therefore the TLB can have global mappings. > > And the configuration is new one for a second reason: because that AMD > machine supports PCID and INVPCID, which leads the CPU detection code to > set the synthetic X86_FEATURE_INVPCID_SINGLE flag. > > Now, __native_flush_tlb_single() does invalidate global mappings when > X86_FEATURE_INVPCID_SINGLE is *not* set and returns. > > When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the > requested address from both PCIDs in the KAISER-enabled case. But if > KAISER is not enabled and the machine has global mappings in the TLB, > then those global mappings do not get invalidated, which would lead to > the above mismatch from using a stale TLB entry. > > So make sure to flush those global mappings in the KAISER disabled case. > > Co-debugged by Babu Moger <babu.moger@xxxxxxx>. > > Reported-by: Jim Mattson <jmattson@xxxxxxxxxx> > Signed-off-by: Borislav Petkov <bp@xxxxxxx> > Link: https://lkml.kernel.org/r/CALMp9eRDSW66%2BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk=dkcBg@xxxxxxxxxxxxxx Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> Great write-up too: many thanks. > --- > arch/x86/include/asm/tlbflush.h | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index f5ca15622dc9..2bfa4deb8cae 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr) > * ASID. But, userspace flushes are probably much more > * important performance-wise. > * > - * Make sure to do only a single invpcid when KAISER is > - * disabled and we have only a single ASID. > + * In the KAISER disabled case, do an INVLPG to make sure > + * the mapping is flushed in case it is a global one. > */ > - if (kaiser_enabled) > + if (kaiser_enabled) { > invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr); > - invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr); > + invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr); > + } else { > + asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); > + } > } > > static inline void __flush_tlb_all(void) > -- > 2.29.2 > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette