On Wed, Sep 4, 2024 at 1:22 AM Anthony Yznaga <anthony.yznaga@xxxxxxxxxx> wrote: > One major issue to address for this series to function correctly > is how to ensure proper TLB flushing when a page in a shared > region is unmapped. For example, since the rmaps for pages in a > shared region map back to host vmas which point to a host mm, TLB > flushes won't be directed to the CPUs the sharing processes have > run on. I am by no means an expert in this area. One idea is to > install a mmu_notifier on the host mm that can gather the necessary > data and do flushes similar to the batch flushing. The mmu_notifier API has two ways you can use it: First, there is the classic mode, where before you start modifying PTEs in some range, you remove mirrored PTEs from some other context, and until you're done with your PTE modification, you don't allow creation of new mirrored PTEs. This is intended for cases where individual PTE entries are copied over to some other context (such as EPT tables for virtualization). When I last looked at that code, it looked fine, and this is what KVM uses. But it probably doesn't match your usecase, since you wouldn't want removal of a single page to cause the entire page table containing it to be temporarily unmapped from the processes that use it? Second, there is a newer mode for IOMMUv2 stuff (using the mmu_notifier_ops::invalidate_range callback), where the idea is that you have secondary MMUs that share the normal page tables, and so you basically send them invalidations at the same time you invalidate the primary MMU for the process. I think that's the right fit for this usecase; however, last I looked, this code was extremely broken (see https://lore.kernel.org/lkml/CAG48ez2NQKVbv=yG_fq_jtZjf8Q=+Wy54FxcFrK_OujFg5BwSQ@xxxxxxxxxxxxxx/ for context). Unless that's changed in the meantime, I think someone would have to fix that code before it can be relied on for new usecases.