Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes

Jann Horn <jannh@xxxxxxxxxx> · Mon, 14 Oct 2024 22:07:52 +0200

On Wed, Sep 4, 2024 at 1:22 AM Anthony Yznaga <anthony.yznaga@xxxxxxxxxx> wrote:
> One major issue to address for this series to function correctly
> is how to ensure proper TLB flushing when a page in a shared
> region is unmapped. For example, since the rmaps for pages in a
> shared region map back to host vmas which point to a host mm, TLB
> flushes won't be directed to the CPUs the sharing processes have
> run on. I am by no means an expert in this area. One idea is to
> install a mmu_notifier on the host mm that can gather the necessary
> data and do flushes similar to the batch flushing.

The mmu_notifier API has two ways you can use it:

First, there is the classic mode, where before you start modifying
PTEs in some range, you remove mirrored PTEs from some other context,
and until you're done with your PTE modification, you don't allow
creation of new mirrored PTEs. This is intended for cases where
individual PTE entries are copied over to some other context (such as
EPT tables for virtualization). When I last looked at that code, it
looked fine, and this is what KVM uses. But it probably doesn't match
your usecase, since you wouldn't want removal of a single page to
cause the entire page table containing it to be temporarily unmapped
from the processes that use it?

Second, there is a newer mode for IOMMUv2 stuff (using the
mmu_notifier_ops::invalidate_range callback), where the idea is that
you have secondary MMUs that share the normal page tables, and so you
basically send them invalidations at the same time you invalidate the
primary MMU for the process. I think that's the right fit for this
usecase; however, last I looked, this code was extremely broken (see
https://lore.kernel.org/lkml/CAG48ez2NQKVbv=yG_fq_jtZjf8Q=+Wy54FxcFrK_OujFg5BwSQ@xxxxxxxxxxxxxx/
for context). Unless that's changed in the meantime, I think someone
would have to fix that code before it can be relied on for new
usecases.