On Thu, Jun 03, 2021 at 05:00:31PM +0100, Catalin Marinas wrote: > On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote: > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index c5d1f3c87dbd..226035cf7d6c 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot, > > return PAGE_SIZE; > > } > > > > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn, > > + unsigned long size) > > +{ > > + if (kvm_has_mte(kvm)) { > > + /* > > + * The page will be mapped in stage 2 as Normal Cacheable, so > > + * the VM will be able to see the page's tags and therefore > > + * they must be initialised first. If PG_mte_tagged is set, > > + * tags have already been initialised. > > + * pfn_to_online_page() is used to reject ZONE_DEVICE pages > > + * that may not support tags. > > + */ > > + unsigned long i, nr_pages = size >> PAGE_SHIFT; > > + struct page *page = pfn_to_online_page(pfn); > > + > > + if (!page) > > + return -EFAULT; > > + > > + for (i = 0; i < nr_pages; i++, page++) { > > + /* > > + * There is a potential (but very unlikely) race > > + * between two VMs which are sharing a physical page > > + * entering this at the same time. However by splitting > > + * the test/set the only risk is tags being overwritten > > + * by the mte_clear_page_tags() call. > > + */ > > And I think the real risk here is when the page is writable by at least > one of the VMs sharing the page. This excludes KSM, so it only leaves > the MAP_SHARED mappings. > > > + if (!test_bit(PG_mte_tagged, &page->flags)) { > > + mte_clear_page_tags(page_address(page)); > > + set_bit(PG_mte_tagged, &page->flags); > > + } > > + } > > If we want to cover this race (I'd say in a separate patch), we can call > mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I > got the arguments right). We can avoid the big lock in most cases if > kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.) > and __alloc_zeroed_user_highpage() clears the tags on allocation (as we > do for VM_MTE but the new flag would not affect the stage 1 VMM page > attributes). Another idea: if VM_SHARED is found for any vma within a region in kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE for the guest or reject the memory slot if MTE was already enabled. An alternative here would be to clear VM_MTE_ALLOWED so that any subsequent mprotect(PROT_MTE) in the VMM would fail in arch_validate_flags(). MTE would still be allowed in the guest but in the VMM for the guest memory regions. We can probably do this irrespective of VM_SHARED. Of course, the VMM can still mmap() the memory initially with PROT_MTE but that's not an issue IIRC, only the concurrent mprotect(). -- Catalin _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm