On 3/4/25 03:52, Borislav Petkov wrote: > On Mon, Mar 03, 2025 at 01:47:42PM -0800, Dave Hansen wrote: ... > IOW, this: > > /* Flush all mappings for a given PCID, not including globals. */ > static inline void __invlpgb_flush_single_pcid_nosync(unsigned long pcid) > { > __invlpgb(0, pcid, 0, 1, 0, INVLPGB_PCID); > cpu_set_tlbsync(true); > } > > Right? Yep, that works. Optimizing out the writes like the old code did is certainly a good thought. But I suspect the cacheline is hot the majority of the time. >>> static void broadcast_tlb_flush(struct flush_tlb_info *info) >>> { >>> bool pmd = info->stride_shift == PMD_SHIFT; >>> @@ -790,6 +821,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, >>> if (IS_ENABLED(CONFIG_PROVE_LOCKING)) >>> WARN_ON_ONCE(!irqs_disabled()); >>> >>> + tlbsync(); >> >> This one is in dire need of comments. > > Maybe this: > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 08672350536f..b97249ffff1f 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -822,6 +822,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > if (IS_ENABLED(CONFIG_PROVE_LOCKING)) > WARN_ON_ONCE(!irqs_disabled()); > > + /* > + * Finish any remote TLB flushes pending from this CPU: > + */ > tlbsync(); That's a prototypical "what" comment and not "why", though. It makes a lot of sense that any flushes that the old task did should complete before a new gets activated. But I honestly can't think of a _specific_ problem that it causes. I don't doubt that this does _some_ good, but I just don't know what good it does. ;)