On Fri, Jun 24, 2022 at 02:50:53PM -0700, Peter Collingbourne wrote: > On Fri, Jun 24, 2022 at 10:05 AM Catalin Marinas > <catalin.marinas@xxxxxxx> wrote: > > + Steven as he added the KVM and swap support for MTE. > > > > On Thu, Jun 23, 2022 at 04:49:44PM -0700, Peter Collingbourne wrote: > > > Certain VMMs such as crosvm have features (e.g. sandboxing, pmem) that > > > depend on being able to map guest memory as MAP_SHARED. The current > > > restriction on sharing MAP_SHARED pages with the guest is preventing > > > the use of those features with MTE. Therefore, remove this restriction. > > > > We already have some corner cases where the PG_mte_tagged logic fails > > even for MAP_PRIVATE (but page shared with CoW). Adding this on top for > > KVM MAP_SHARED will potentially make things worse (or hard to reason > > about; for example the VMM sets PROT_MTE as well). I'm more inclined to > > get rid of PG_mte_tagged altogether, always zero (or restore) the tags > > on user page allocation, copy them on write. For swap we can scan and if > > all tags are 0 and just skip saving them. > > A problem with this approach is that it would conflict with any > potential future changes that we might make that would require the > kernel to avoid modifying the tags for non-PROT_MTE pages. Not if in all those cases we check VM_MTE_ALLOWED. We seem to have the vma available where it matters. We can keep PG_mte_tagged around but always set it on page allocation (e.g. when zeroing or CoW) and check VM_MTE_ALLOWED rather than VM_MTE. I'm not sure how Linux can deal with pages that do not support MTE. Currently we only handle this at the vm_flags level. Assuming that we somehow manage to, we can still use PG_mte_tagged to mark the pages that supported tags on allocation (and they have been zeroed or copied). I guess if you want to move a page around, you'd need to go through something like try_to_unmap() (or set all mappings to PROT_NONE like in NUMA migration). Then you can either check whether the page is PROT_MTE anywhere and maybe read the tags to see whether all are zero after unmapping. Deferring tag zeroing/restoring to set_pte_at() can be racy without a lock (or your approach with another flag) but I'm not sure it's worth the extra logic if zeroing or copying the tags doesn't have a significant overhead for non-PROT_MTE pages. Another issue with set_pte_at() is that it can write tags on mprotect() even if the page is mapped read-only. So far I couldn't find a problem with this but it adds to the complexity. > Thinking about this some more, another idea that I had was to only > allow MAP_SHARED mappings in a guest with MTE enabled if the mapping > is PROT_MTE and there are no non-PROT_MTE aliases. For anonymous > mappings I don't think it's possible to create a non-PROT_MTE alias in > another mm (since you can't turn off PROT_MTE with mprotect), and for > memfd maybe we could introduce a flag that requires PROT_MTE on all > mappings. That way, we are guaranteed that either the page has been > tagged prior to fault or we have exclusive access to it so it can be > tagged on demand without racing. Let me see what effect that has on > crosvm. You could still have all initial shared mappings as !PROT_MTE and some mprotect() afterwards setting PG_mte_tagged and clearing the tags and this can race. AFAICT, the easiest way to avoid the race is to set PG_mte_tagged on allocation before it ends up in set_pte_at(). -- Catalin