On Fri, Aug 25, 2023 at 12:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Aug 24, 2023, David Stevens wrote: > > On Wed, Jul 5, 2023 at 7:25 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote: > > > > > > On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote: > > > > @@ -4529,7 +4540,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > > > > > > > > out_unlock: > > > > read_unlock(&vcpu->kvm->mmu_lock); > > > > - kvm_release_pfn_clean(fault->pfn); > > > > > > Yet kvm_release_pfn() can still be triggered for the kvm_vcpu_maped gfns. > > > What if guest uses a non-referenced page(e.g., as a vmcs12)? Although I > > > believe this is not gonna happen in real world... > > > > kvm_vcpu_map still uses gfn_to_pfn, which eventually passes FOLL_GET > > to __kvm_follow_pfn. So if a guest tries to use a non-refcounted page > > like that, then kvm_vcpu_map will fail and the guest will probably > > crash. It won't trigger any bugs in the host, though. > > > > It is unfortunate that the guest will be able to use certain types of > > memory for some purposes but not for others. However, while it is > > theoretically fixable, it's an unreasonable amount of work for > > something that, as you say, nobody really cares about in practice [1]. > > > > [1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@xxxxxxxxxx/ > > There are use cases that care, which is why I suggested allow_unsafe_kmap. > Specifically, AWS manages their pool of guest memory in userspace and maps it all > via /dev/mem. Without that module param to let userspace opt-in, this series will > break such setups. It still arguably is a breaking change since it requires > userspace to opt-in, but allowing such behavior by default is simply not a viable > option, and I don't have much sympathy since so much of this mess has its origins > in commit e45adf665a53 ("KVM: Introduce a new guest mapping API"). > > The use cases that no one cares about (AFAIK) is allowing _untrusted_ userspace > to back guest RAM with arbitrary memory. In other words, I want KVM to allow > (by default) mapping device memory into the guest for things like vGPUs, without > having to do the massive and invasive overhaul needed to safely allow backing guest > RAM with completely arbitrary memory. Do you specifically want the allow_unsafe_kmap breaking change? v7 of this series should have supported everything that is currently supported by KVM, but you're right that the v8 version of hva_to_pfn_remapped doesn't support mapping !kvm_pfn_to_refcounted_page() pages. That could be supported explicitly with allow_unsafe_kmap as you suggested, or it could be supported implicitly with this implementation: @@ -2774,8 +2771,14 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * would then underflow the refcount when the caller does the * required put_page. Don't allow those pages here. */ - if (!kvm_try_get_pfn(pfn)) - r = -EFAULT; + page = kvm_pfn_to_refcounted_page(pfn); + if (page) { + if (get_page_unless_zero(page)) + WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll, page) != pfn); + + if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier) + r = -EFAULT; + } -David