Re: [PATCH 4.19] x86/mm/cpa: Unconditionally avoid WBINDV when we can

Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> · Tue, 5 Jul 2022 06:39:04 +0200

On Tue, Jul 05, 2022 at 11:45:29AM +0800, Wen Yang wrote:
> 
> 
> 在 2022/7/4 下午11:59, Greg Kroah-Hartman 写道:
> > On Mon, Jul 04, 2022 at 11:45:08PM +0800, Wen Yang wrote:
> > > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > 
> > > commit ddd07b750382adc2b78fdfbec47af8a6e0d8ef37 upstream.
> > > 
> > > CAT has happened, WBINDV is bad (even before CAT blowing away the
> > > entire cache on a multi-core platform wasn't nice), try not to use it
> > > ever.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxx>
> > > Cc: Bin Yang <bin.yang@xxxxxxxxx>
> > > Cc: Mark Gross <mark.gross@xxxxxxxxx>
> > > Link: https://lkml.kernel.org/r/20180919085947.933674526@xxxxxxxxxxxxx
> > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.19.x
> > > Signed-off-by: Wen Yang <wenyang@xxxxxxxxxxxxxxxxx>
> > > ---
> > >   arch/x86/mm/pageattr.c | 18 ++----------------
> > >   1 file changed, 2 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > > index 101f3ad0d6ad..ab87da7a6043 100644
> > > --- a/arch/x86/mm/pageattr.c
> > > +++ b/arch/x86/mm/pageattr.c
> > > @@ -239,26 +239,12 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache,
> > >   			    int in_flags, struct page **pages)
> > >   {
> > >   	unsigned int i, level;
> > > -#ifdef CONFIG_PREEMPT
> > > -	/*
> > > -	 * Avoid wbinvd() because it causes latencies on all CPUs,
> > > -	 * regardless of any CPU isolation that may be in effect.
> > > -	 *
> > > -	 * This should be extended for CAT enabled systems independent of
> > > -	 * PREEMPT because wbinvd() does not respect the CAT partitions and
> > > -	 * this is exposed to unpriviledged users through the graphics
> > > -	 * subsystem.
> > > -	 */
> > > -	unsigned long do_wbinvd = 0;
> > > -#else
> > > -	unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
> > > -#endif
> > >   	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
> > > -	on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
> > > +	flush_tlb_all();
> > > -	if (!cache || do_wbinvd)
> > > +	if (!cache)
> > >   		return;
> > >   	/*
> > > -- 
> > > 2.19.1.6.gb485710b
> > > 
> > 
> > Why is this needed on 4.19.y?  What problem does it solve, it looks only
> > like an optimization, not a bugfix.
> > 
> > And if it's a bugfix, why only 4.19.y, why not older kernels too?
> > 
> > We need more information here please.
> > 
> 
> On a 128-core Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz server, when the
> user program frequently calls nv_alloc_system_pages to allocate large
> memory, it often causes a delay of about 200 milliseconds for the entire
> system. In this way, other latency-sensitive tasks on this system are
> heavily impacted, causing stability issues in large-scale clusters as well.
> 
> nv_alloc_system_pages
> -> _set_memory_array
> -> change_page_attr_set_clr
> -> cpa_flush_array
> -> on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
> 
> 
> This patch can be directly merged into the 4.19 kernel to solve this
> problem, and most of the machines in our production environment are 4.19
> kernels.

Ah.  So what has changed from last year when I rejected this then:
	https://lore.kernel.org/all/9c415df9-9575-8217-03e9-a6bbf20a491a@xxxxxxxxxxxxxxxxx/T/#m06369d080fa97eda3dd6a8eaf54a8ca2d430b3ab

Please do not try to submit previously-rejected patches, that is very
disingenuous.

greg k-h