Re: [PATCH] KVM:SVM: Flush cache only on CPUs running SEV guest

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 5 Mar 2024 16:19:46 -0800

On Tue, Mar 05, 2024, Tom Lendacky wrote:
> On 3/4/24 11:55, Sean Christopherson wrote:
> > +Tom
> > 
> > "KVM: SVM:" for the shortlog scope.
> > 
> > On Fri, Mar 01, 2024, Zheyun Shen wrote:
> > > On AMD CPUs without ensuring cache consistency, each memory page reclamation in
> > > an SEV guest triggers a call to wbinvd_on_all_cpus, thereby affecting the
> > > performance of other programs on the host.
> > > 
> > > Typically, an AMD server may have 128 cores or more, while the SEV guest might only
> > > utilize 8 of these cores. Meanwhile, host can use qemu-affinity to bind these 8 vCPUs
> > > to specific physical CPUs.
> > > 
> > > Therefore, keeping a record of the physical core numbers each time a vCPU runs
> > > can help avoid flushing the cache for all CPUs every time.
> > 
> > This needs an unequivocal statement from AMD that flushing caches only on CPUs
> > that do VMRUN is sufficient.  That sounds like it should be obviously correct,
> > as I don't see how else a cache line can be dirtied for the encrypted PA, but
> > this entire non-coherent caches mess makes me more than a bit paranoid.
> 
> As long as the wbinvd_on_all_cpus() related to the ASID flushing isn't
> changed, this should be ok. And the code currently flushes the source pages
> when doing LAUNCH_UPDATE commands and adding encrypted regions, so should be
> good there.

Nice, thanks!

> Would it make sense to make this configurable, with the current behavior the
> default, until testing looks good for a while?

I don't hate the idea, but I'm inclined to hit the "I'm feeling lucky" button.
I would rather we put in effort to all but guarantee we can do a clean revert in
the future, at which point a kill switch doesn't add all that much value.  E.g.
it would allow for a non-disruptive fix, and maybe a slightly faster confirmation
of a bug, but that's about it.

And since the fallout from this would be host data corruption, _not_ rebooting
hosts that may have been corrupted is probably a bad idea, i.e. the whole
non-disruptive fix benefit is quite dubious.

The other issue is that it'd be extremely difficult to know when we could/should
remove the kill switch.  It might be months or even years before anyone starts
running high volume of SEV/SEV-ES VMs with this optimization.