Re: [PATCH v14 11/13] x86/mm: do targeted broadcast flushing from tlbbatch code

Borislav Petkov <bp@xxxxxxxxx> · Tue, 4 Mar 2025 12:52:07 +0100

On Mon, Mar 03, 2025 at 01:47:42PM -0800, Dave Hansen wrote:
> > +static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr)
> > +{
> > +	__invlpgb_flush_addr_nosync(addr, nr);
> > +	if (!cpu_need_tlbsync())
> > +		cpu_set_tlbsync(true);
> > +}
> 
> One thought on these:
> 
> Instead of having three functions:
> 
> 	1. A raw __invlpgb_*_nosync()
> 	2. A wrapper invlpgb_*_nosync() that flips cpu_set_tlbsync()
> 	3. A wrapper invlpgb_*()
> 
> Could we get away with just two?  For instance, what if we had *ALL*
> __invlpgb()'s do cpu_set_tlbsync()? Then we'd universally call tlbsync().
> 
> static inline void invlpgb_flush_all_nonglobals(void)
> {
>         guard(preempt)();
>         __invlpgb(0, 0, 0, 1, NO_STRIDE, INVLPGB_MODE_ALL_NONGLOBALS);
>         tlbsync();
> }
> 
> Then we wouldn't need any of those three new wrappers. The only downside
> is that we'd end up with paths that logically do:
> 
> 	__invlpgb()
> 	cpu_set_tlbsync(true);
> 	if (cpu_need_tlbsync()) { // always true
> 		__tlbsync();
> 		cpu_set_tlbsync(true);
> 	}
> 
> In other words, a possibly superfluous set/check/clear of the
> "need_tlbsync" state. But I'd expect that to be a pittance compared to
> the actual cost of INVLPGB/TLBSYNC.

Lemme see whether I can grasp properly what you mean:

What you really want is for the _nosync() variants to set need_tlbsync, right?

Because looking at all the call sites which do set tlbsync, the flow is this:

        __invlpgb_flush_user_nr_nosync(pcid, addr, nr, pmd_stride, freed_tables);
        if (!cpu_need_tlbsync())
                cpu_set_tlbsync(true);

So we should move that

	cpu_set_tlbsync(true);

into the __invlpgb_*_nosync() variant.

And in order to make it even simpler, we should drop the testing too:

IOW, this:

/* Flush all mappings for a given PCID, not including globals. */
static inline void __invlpgb_flush_single_pcid_nosync(unsigned long pcid)
{
        __invlpgb(0, pcid, 0, 1, 0, INVLPGB_PCID);
        cpu_set_tlbsync(true);
}

Right?

> >  static void broadcast_tlb_flush(struct flush_tlb_info *info)
> >  {
> >  	bool pmd = info->stride_shift == PMD_SHIFT;
> > @@ -790,6 +821,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
> >  	if (IS_ENABLED(CONFIG_PROVE_LOCKING))
> >  		WARN_ON_ONCE(!irqs_disabled());
> >  
> > +	tlbsync();
> 
> This one is in dire need of comments.

Maybe this:

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 08672350536f..b97249ffff1f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -822,6 +822,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
        if (IS_ENABLED(CONFIG_PROVE_LOCKING))
                WARN_ON_ONCE(!irqs_disabled());
 
+       /*
+        * Finish any remote TLB flushes pending from this CPU:
+        */
        tlbsync();
 
        /*

> Ditto, *especially* if this hits the init_mm state. There really
> shouldn't be any deferred flushes for the init_mm.

Basically what you said but as a comment. :-P

Merged in the rest.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette