Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events

Mingwei Zhang <mizhang@xxxxxxxxxx> · Fri, 1 Dec 2023 13:58:28 -0800

On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@xxxxxxx> wrote:
>
> On 12/1/2023 1:02 PM, Mingwei Zhang wrote:
> > On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >>
> >> On Fri, Nov 10, 2023, Jacky Li wrote:
> >>> The cache flush operation in sev guest memory reclaim events was
> >>> originally introduced to prevent security issues due to cache
> >>> incoherence and untrusted VMM. However when this operation gets
> >>> triggered, it causes performance degradation to the whole machine.
> >>>
> >>> This cache flush operation is performed in mmu_notifiers, in particular,
> >>> in the mmu_notifier_invalidate_range_start() function, unconditionally
> >>> on all guest memory regions. Although the intention was to flush
> >>> cache lines only when guest memory was deallocated, the excessive
> >>> invocations include many other cases where this flush is unnecessary.
> >>>
> >>> This RFC proposes using the mmu notifier event to determine whether a
> >>> cache flush is needed. Specifically, only do the cache flush when the
> >>> address range is unmapped, cleared, released or migrated. A bitmap
> >>> module param is also introduced to provide flexibility when flush is
> >>> needed in more events or no flush is needed depending on the hardware
> >>> platform.
> >>
> >> I'm still not at all convinced that this is worth doing.  We have clear line of
> >> sight to cleanly and optimally handling SNP and beyond.  If there is an actual
> >> use case that wants to run SEV and/or SEV-ES VMs, which can't support page
> >> migration, on the same host as traditional VMs, _and_ for some reason their
> >> userspace is incapable of providing reasonable NUMA locality, then the owners of
> >> that use case can speak up and provide justification for taking on this extra
> >> complexity in KVM.
> >
> > Hi Sean,
> >
> > Jacky and I were looking at some cases like mmu_notifier calls
> > triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
> > off page migration etc, splitting PMD may still happen at some point
> > under this reason, and we will never be able to turn it off by
> > tweaking kernel CONFIG options. So, I think this is the line of sight
> > for this series.
> >
> > Handling SNP could be separate, since in SNP we have per-page
> > properties, which allow KVM to know which page to flush individually.
> >
>
> For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> not acting on encrypted pages, we are ignoring MMU invalidation
> notifiers for SNP guests as part of the SNP host patches being posted
> upstream and instead relying on gmem own invalidation stuff to clean
> them up on a per-folio basis.
>
> Thanks,
> Ashish

oh, I have no question about that. This series only applies to
SEV/SEV-ES type of VMs.

For SNP + guest_memfd, I don't see the implementation details, but I
doubt you can ignore mmu_notifiers if the request does cover some
encrypted memory in error cases or corner cases. Does the SNP enforce
the usage of guest_memfd? How do we prevent exceptional cases? I am
sure you guys already figured out the answers, so I don't plan to dig
deeper until SNP host pages are accepted.

Clearly, for SEV/SEV-ES, there is no such guarantee like guest_memfd.
Applying guest_memfd on SEV/SEV-ES might require changes on SEV API I
suspect, so I think that's equally non-trivial and thus may not be
worth pursuing.

Thanks.
-Mingwei