Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events

"Kalra, Ashish" <ashish.kalra@xxxxxxx> · Fri, 1 Dec 2023 16:13:21 -0600

On 12/1/2023 3:58 PM, Mingwei Zhang wrote:
On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@xxxxxxx> wrote:

On 12/1/2023 1:02 PM, Mingwei Zhang wrote:
On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

On Fri, Nov 10, 2023, Jacky Li wrote:
The cache flush operation in sev guest memory reclaim events was
originally introduced to prevent security issues due to cache
incoherence and untrusted VMM. However when this operation gets
triggered, it causes performance degradation to the whole machine.

This cache flush operation is performed in mmu_notifiers, in particular,
in the mmu_notifier_invalidate_range_start() function, unconditionally
on all guest memory regions. Although the intention was to flush
cache lines only when guest memory was deallocated, the excessive
invocations include many other cases where this flush is unnecessary.

This RFC proposes using the mmu notifier event to determine whether a
cache flush is needed. Specifically, only do the cache flush when the
address range is unmapped, cleared, released or migrated. A bitmap
module param is also introduced to provide flexibility when flush is
needed in more events or no flush is needed depending on the hardware
platform.

I'm still not at all convinced that this is worth doing.  We have clear line of
sight to cleanly and optimally handling SNP and beyond.  If there is an actual
use case that wants to run SEV and/or SEV-ES VMs, which can't support page
migration, on the same host as traditional VMs, _and_ for some reason their
userspace is incapable of providing reasonable NUMA locality, then the owners of
that use case can speak up and provide justification for taking on this extra
complexity in KVM.

Hi Sean,

Jacky and I were looking at some cases like mmu_notifier calls
triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
off page migration etc, splitting PMD may still happen at some point
under this reason, and we will never be able to turn it off by
tweaking kernel CONFIG options. So, I think this is the line of sight
for this series.

Handling SNP could be separate, since in SNP we have per-page
properties, which allow KVM to know which page to flush individually.

For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
not acting on encrypted pages, we are ignoring MMU invalidation
notifiers for SNP guests as part of the SNP host patches being posted
upstream and instead relying on gmem own invalidation stuff to clean
them up on a per-folio basis.

oh, I have no question about that. This series only applies to
SEV/SEV-ES type of VMs.

For SNP + guest_memfd, I don't see the implementation details, but I
doubt you can ignore mmu_notifiers if the request does cover some
encrypted memory in error cases or corner cases.

I believe that all page state transitions from private->shared will 
invoke gmem's own invalidation stuff which should cover such corner cases.

Mike Roth can provide more specific details about that.

Does the SNP enforce the usage of guest_memfd? 

Again i believe that SNP implementation is only based on and uses guest 
memfd support.

Thanks,
Ashish

How do we prevent exceptional cases? I am
sure you guys already figured out the answers, so I don't plan to dig
deeper until SNP host pages are accepted.

Clearly, for SEV/SEV-ES, there is no such guarantee like guest_memfd.
Applying guest_memfd on SEV/SEV-ES might require changes on SEV API I
suspect, so I think that's equally non-trivial and thus may not be
worth pursuing.

Thanks.
-Mingwei