On Wed, Nov 20, 2019 at 07:28:07AM -0800, Sean Christopherson wrote: > On Wed, Nov 20, 2019 at 12:52:16PM +0100, Christoffer Dall wrote: > > On Tue, Nov 19, 2019 at 07:44:48PM -0800, Sean Christopherson wrote: > > > On Fri, Nov 08, 2019 at 12:19:20PM +0100, Christoffer Dall wrote: > > > > First, what prevents user space from messing around with the VMAs after > > > > kvm_arch_prepare_memory_region() completes? If nothing, then what is > > > > the value of the cheks we perform wrt. to VMAs? > > > > > > Arm's prepare_memory_region() holds mmap_sem and mmu_lock while processing > > > the VMAs and populating the stage 2 page tables. Holding mmap_sem prevents > > > the VMAs from being invalidated while the stage 2 tables are populated, > > > e.g. prevents racing with the mmu notifier. The VMAs could be modified > > > after prepare_memory_region(), but the mmu notifier will ensure they are > > > unmapped from stage2 prior the the host change taking effect. So I think > > > you're safe (famous last words). > > > > > > > So we for example check: > > > > writeable = !(memslot->falgs & KVM_MEM_READONLY); > > if (writeable && !(vma->vm_flags & VM_WRITE)) > > return -EPERM; > > > > And yes, user space can then unmap the VMAs and MMU notifiers will > > unmap the stage 2 entries, but user space can then create a new > > read-only VMA covering the area of the memslot and the fault-handling > > path will have to deal with this same check later.Only, the fault > > handling path, via gfn_to_pfn_prot(), returns an address based on an > > entirely different set of mechanics, than our prepare_memory_region, > > which I think indicates we are doing something wrong somewhere, and we > > should have a common path for faulting things in, for I/O, both if we do > > this up-front or if we do this at fault time. > > Unconditionally interpreting vm_pgoff as a physical address does not seem > correct. There are cases where that might be correct, e.g. if the backing > (virtual) file is a flat representation of the address space, which appears > to be the case on some architectures, e.g. for PCI handling. But even then > there should be some confirmation that the VMA is actually associated with > such a file, otherwise KVM is at the mercy of userspace to do the right > thing (unless there are other guarantees on arm I am unaware of). > > > > > Second, why would arm/arm64 need special handling for I/O mappings > > > > compared to other architectures, and how is this dealt with for > > > > x86/s390/power/... ? > > > > > > As Ard mentioned, it looks like an optimization. The "passthrough" > > > part from the changelog implies that VM_PFNMAP memory regions are exclusive > > > to the guest. Mapping the entire thing would be a nice boot optimization > > > as it would save taking page faults on every page of the MMIO region. > > > > > > As for how this is different from other archs... at least on x86, VM_PFNMAP > > > isn't guaranteed to be passthrough or even MMIO, e.g. prefaulting the > > > pages may actually trigger allocation, and remapping the addresses could be > > > flat out wrong. > > > > What does VM_PFNMAP mean on x86? I didn't think we were relying on > > anything architecture specific in their meaning in the arm code, and I > > thought the VM_PFNMAP was a generic mm flag with generic mm meaning, > > but I could be wrong here? > > No, you're correct, VM_PFNMAP is a generic flag that state the VMA doesn't > have an associated struct page and is being managed directly by something > other than the core mmu. > > But not having a struct page doesn't guarantee that the PFN is backed by > MMIO, or that it is exclusive to the guest (although in practice this is > probably the case 99.9999% of the time). E.g. x86 supports having guest > memory backed by regular ram that is hidden from the host kernel via > 'mem=', which will show up as VM_PFNMAP. > > > Is there any valid semantics for creating a memslot backed by a > > VM_PFNMAP on x86, and if so, what are those? > > > > Similarly, if you do map a device region straight to the guest on x86, > > how is that handled? (A pointer to the right place in the myriad of EPT > > and shadow code in x86 would be much appreciated.) > > There is no special handling in x86 for VM_PFNMAP memory, we rely on KVM's > generic __gfn_to_pfn_memslot() to retrieve the PFN on demand, and use > mmu_notifier_seq to ensure the stale PFNs (invalidated in the host) aren't > inserted into the guest page tables. Effectively the same thing arm does, > sans the prepare_memory_region() shenanigans. Thanks Sean, I'll have a look at reworking said shenanigans ;) Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm