On Fri, Jan 17, 2025 at 10:00:50AM -0400, Jason Gunthorpe wrote: > On Thu, Jan 16, 2025 at 10:28:48PM +0000, Catalin Marinas wrote: > > with FEAT_MTE_PERM (patches from Aneesh on the list). Or, a bigger > > happen, disable MTE in guests (well, not that big, not many platforms > > supporting MTE, especially in the enterprise space). > > As above, it seems we already effectively disable MTE in guests to use > VFIO. That's fine. Once the NoTagAccess feature gets in, we could allow MTE here as well. > > A second problem, similar to relaxing to Normal NC we merged last year, > > we can't tell what allowing Stage 2 cacheable means (SError etc). > > That was a very different argument. On that series KVM was upgrading a > VM with pgprot noncached to Normal NC, that upgrade was what triggered > the discussions about SError. > > For this case the VMA is already pgprot cache. KVM is not changing > anything. The KVM S2 will have the same Normal NC memory type as the > VMA has in the S1. Thus KVM has no additional responsibility for > safety here. I agree this is safe. My point was more generic about not allowing cacheable mappings without some sanity check. Ankit's patch relies on the pgprot used on the S1 mapping to make this decision. Presumably the pgprot is set by the host driver. > > information. Checking vm_page_prot instead of a VM_* flag may work if > > it's mapped in user space but this might not always be the case. > > For this series it is only about mapping VMAs. Some future FD based > mapping for CC is going to also need similar metadata.. I have another > thread about that :) How soon would you need that and if you come up with a different mechanism, shouldn't we unify them early rather than having two methods? > > I don't see how VM_PFNMAP alone can tell us anything about the > > access properties supported by a device address range. Either way, > > it's the driver setting vm_page_prot or some VM_* flag. KVM has no > > clue, it's just a memory slot. > > I think David's point about VM_PFNMAP was to avoid some of the > pfn_valid() logic. If we get VM_PFNMAP we just assume it is non-struct > page and follow the VMA's pgprot. Ah, ok, thanks for the clarification. > > A third aspect, more of a simplification when reasoning about this, was > > to use FWB at Stage 2 to force cacheability and not care about cache > > maintenance, especially when such range might be mapped both in user > > space and in the guest. > > Yes, I thought we needed this anyhow as KVM can't cache invalidate > non-struct page memory.. Looks good. I think Ankit should post a new series factoring out the exec handling in a separate patch, dropping some of the pfn_valid() assumptions and we take it from there. I also think some sanity check should be done early in kvm_arch_prepare_memory_region() like rejecting the slot if it's cacheable and we don't have FWB. But I'll leave this decision to the KVM maintainers. We are trying to relax some stuff here as well (see the NoTagAccess thread). -- Catalin