On Mon, Mar 03, 2025 at 01:47:42PM -0800, Dave Hansen wrote: > > +static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr) > > +{ > > + __invlpgb_flush_addr_nosync(addr, nr); > > + if (!cpu_need_tlbsync()) > > + cpu_set_tlbsync(true); > > +} > > One thought on these: > > Instead of having three functions: > > 1. A raw __invlpgb_*_nosync() > 2. A wrapper invlpgb_*_nosync() that flips cpu_set_tlbsync() > 3. A wrapper invlpgb_*() > > Could we get away with just two? For instance, what if we had *ALL* > __invlpgb()'s do cpu_set_tlbsync()? Then we'd universally call tlbsync(). > > static inline void invlpgb_flush_all_nonglobals(void) > { > guard(preempt)(); > __invlpgb(0, 0, 0, 1, NO_STRIDE, INVLPGB_MODE_ALL_NONGLOBALS); > tlbsync(); > } > > Then we wouldn't need any of those three new wrappers. The only downside > is that we'd end up with paths that logically do: > > __invlpgb() > cpu_set_tlbsync(true); > if (cpu_need_tlbsync()) { // always true > __tlbsync(); > cpu_set_tlbsync(true); > } > > In other words, a possibly superfluous set/check/clear of the > "need_tlbsync" state. But I'd expect that to be a pittance compared to > the actual cost of INVLPGB/TLBSYNC. Lemme see whether I can grasp properly what you mean: What you really want is for the _nosync() variants to set need_tlbsync, right? Because looking at all the call sites which do set tlbsync, the flow is this: __invlpgb_flush_user_nr_nosync(pcid, addr, nr, pmd_stride, freed_tables); if (!cpu_need_tlbsync()) cpu_set_tlbsync(true); So we should move that cpu_set_tlbsync(true); into the __invlpgb_*_nosync() variant. And in order to make it even simpler, we should drop the testing too: IOW, this: /* Flush all mappings for a given PCID, not including globals. */ static inline void __invlpgb_flush_single_pcid_nosync(unsigned long pcid) { __invlpgb(0, pcid, 0, 1, 0, INVLPGB_PCID); cpu_set_tlbsync(true); } Right? > > static void broadcast_tlb_flush(struct flush_tlb_info *info) > > { > > bool pmd = info->stride_shift == PMD_SHIFT; > > @@ -790,6 +821,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > > if (IS_ENABLED(CONFIG_PROVE_LOCKING)) > > WARN_ON_ONCE(!irqs_disabled()); > > > > + tlbsync(); > > This one is in dire need of comments. Maybe this: diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 08672350536f..b97249ffff1f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -822,6 +822,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, if (IS_ENABLED(CONFIG_PROVE_LOCKING)) WARN_ON_ONCE(!irqs_disabled()); + /* + * Finish any remote TLB flushes pending from this CPU: + */ tlbsync(); /* > Ditto, *especially* if this hits the init_mm state. There really > shouldn't be any deferred flushes for the init_mm. Basically what you said but as a comment. :-P Merged in the rest. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette