On Thu, Jun 23, 2022, Peter Xu wrote: > On Thu, Jun 23, 2022 at 02:49:47PM +0000, Sean Christopherson wrote: > > On Wed, Jun 22, 2022, Peter Xu wrote: > > > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > > > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > > > for new boolean to be added to __gfn_to_pfn_memslot(). ... > > > +/* gfn_to_pfn (gtp) flags */ > > > +typedef unsigned int __bitwise kvm_gtp_flag_t; > > > + > > > +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > > > +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > > > + > > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > > > - bool atomic, bool *async, bool write_fault, > > > + kvm_gtp_flag_t gtp_flags, bool *async, > > > bool *writable, hva_t *hva); > > > > I completely agree the list of booleans is a mess, but I don't love the result of > > adding @flags. I wonder if we can do something similar to x86's struct kvm_page_fault > > and add an internal struct to pass params. > > Yep we can. It's just that it'll be another goal irrelevant of this series But it's not irrelevant. By introducing KVM_GTP_*, you opened the topic of cleaning up the parameters. Don't get me wrong, I like that you proposed cleaning up the mess, but if we're going to churn then let's get the API right. > but it could be a standalone cleanup patchset for gfn->hpa conversion > paths. Say, the new struct can also be done on top containing the new > flag, IMHO. No, because if we go to a struct, then I'd much rather have bools and not flags. > This reminded me of an interesting topic that Nadav used to mention that > when Matthew changed some of the Linux function parameters into a structure > then the .obj actually grows a bit due to the strong stack protector that > Linux uses. If I'll be doing such a change I'd guess I need to dig a bit > into that first, but hopefully I don't need to for this series alone. > > Sorry to be off-topic: I think it's a matter of whether you think it's okay > we merge the flags first, even if we want to go with a struct pointer > finally. Either take a dependency on doing a full cleanup, or just add yet another bool and leave _all_ cleanup to a separate series. Resolving conflicts with a new param is fairly straightforward, whereas resolving divergent cleanups gets painful. As gross as it is, I think my preference would be to just add another bool in this series. Then we can get more aggressive with a cleanup without having to worry about unnecessarily pushing this series out a release or three. > > And then add e.g. gfn_to_pfn_interruptible() to wrap that logic. > > That helper sounds good, it's just that the major user I'm modifying here > doesn't really use gfn_to_pfn() at all but __gfn_to_pfn_memslot() > underneath. I'll remember to have that when I plan to convert some > gfn_to_pfn() call sites. Ah, right. That can be remedied more easily if @async goes away. Then we can have: gfn_to_pfn_memslot_nowait() and gfn_to_pfn_memslot_interruptible() and those are mutually exclusive, i.e. recognize generic signals if and only if gup is allowed to wait. But that can be left to the cleanup series. > > I suspect we could also clean up the @async behavior at the same time, as its > > interaction with FOLL_NOWAIT is confusing. > > Yeah I don't like that either. Let me think about that when proposing a > new version. Logically that's separate idea from this series too, but if > you think that'll be nice to have altogether then I can give it a shot. This is what I came up with for splitting @async into a pure input (no_wait) and a return value (KVM_PFN_ERR_NEEDS_IO). --- arch/arm64/kvm/mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 8 ++-- include/linux/kvm_host.h | 3 +- virt/kvm/kvm_main.c | 57 ++++++++++++++------------ virt/kvm/kvm_mm.h | 2 +- virt/kvm/pfncache.c | 2 +- 8 files changed, 41 insertions(+), 39 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 87f1cd0df36e..a87f01edef8e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1204,7 +1204,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, write_fault, &writable, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 514fd45c1994..32f4b56ca315 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -590,7 +590,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, /* * Do a fast check first, since __gfn_to_pfn_memslot doesn't - * do it with !atomic && !async, which is how we call it. + * do it with !atomic && !nowait, which is how we call it. * We always ask for write permission since the common case * is that the page is writable. */ @@ -598,7 +598,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, write_ok = true; } else { /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, writing, &write_ok, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 42851c32ff3b..4338affe295e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -845,7 +845,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long pfn; /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, writing, upgrade_p, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 79c6a821ea0d..35b364589fa4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4102,7 +4102,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; - bool async; /* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -4131,11 +4130,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return RET_PF_EMULATE; } - async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, fault->write, &fault->map_writable, &fault->hva); - if (!async) + if (fault->pfn != KVM_PFN_ERR_NEEDS_IO) return RET_PF_CONTINUE; /* *pfn has correct page already */ if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { @@ -4149,7 +4147,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, fault->write, &fault->map_writable, &fault->hva); return RET_PF_CONTINUE; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3554e48406e4..ecd5f686d33a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -96,6 +96,7 @@ #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) +#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 3) /* * error pfns indicate that the gfn is in slot but faild to @@ -1146,7 +1147,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + bool atomic, bool no_wait, bool write_fault, bool *writable, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 45188d11812c..6b63aa5fa5ed 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2497,7 +2497,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, +static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault, bool *writable, kvm_pfn_t *pfn) { unsigned int flags = FOLL_HWPOISON; @@ -2511,7 +2511,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, if (write_fault) flags |= FOLL_WRITE; - if (async) + if (no_wait) flags |= FOLL_NOWAIT; npages = get_user_pages_unlocked(addr, 1, &page, flags); @@ -2619,28 +2619,31 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, } /* - * Pin guest page in memory and return its pfn. + * Get the host pfn for a given host virtual address. If a pfn is found and is + * backed by a refcounted struct page, the caller is responsible for putting + * the reference, i.e. this returns with an elevated refcount. + * * @addr: host virtual address which maps memory to the guest - * @atomic: whether this function can sleep - * @async: whether this function need to wait IO complete if the - * host page is not in the memory - * @write_fault: whether we should get a writable host page - * @writable: whether it allows to map a writable host page for !@write_fault - * - * The function will map a writable host page for these two cases: - * 1): @write_fault = true - * 2): @write_fault = false && @writable, @writable will tell the caller - * whether the mapping is writable. + * @atomic: if true, do not sleep (effectively means "fast gup only") + * @no_wait: if true, do not wait for IO to complete if the host page is not in + * memory, e.g. is swapped out or not yet transfered during post-copy + * @write_fault: if true, a writable mapping is _required_ + * @writable: if non-NULL, a writable mapping is _allowed_, but not required; + * set to %true (if non-NULL) and a writable host page was retrieved */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait, bool write_fault, bool *writable) { struct vm_area_struct *vma; kvm_pfn_t pfn; int npages, r; - /* we can do it either atomically or asynchronously, not both */ - BUG_ON(atomic && async); + /* + * Waiting requires sleeping, and so is mutually exclusive with atomic + * lookups which are not allowed to sleep. + */ + if (WARN_ON_ONCE(atomic && !no_wait)) + return KVM_PFN_ERR_FAULT; if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) return pfn; @@ -2648,13 +2651,13 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); + npages = hva_to_pfn_slow(addr, no_wait, write_fault, writable, &pfn); if (npages == 1) return pfn; mmap_read_lock(current->mm); if (npages == -EHWPOISON || - (!async && check_user_page_hwpoison(addr))) { + (!no_wait && check_user_page_hwpoison(addr))) { pfn = KVM_PFN_ERR_HWPOISON; goto exit; } @@ -2671,9 +2674,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (r < 0) pfn = KVM_PFN_ERR_FAULT; } else { - if (async && vma_is_valid(vma, write_fault)) - *async = true; - pfn = KVM_PFN_ERR_FAULT; + if (no_wait && vma_is_valid(vma, write_fault)) + pfn = KVM_PFN_ERR_NEEDS_IO; + else + pfn = KVM_PFN_ERR_FAULT; } exit: mmap_read_unlock(current->mm); @@ -2681,7 +2685,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, } kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + bool atomic, bool no_wait, bool write_fault, bool *writable, hva_t *hva) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -2707,28 +2711,27 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } - return hva_to_pfn(addr, atomic, async, write_fault, - writable); + return hva_to_pfn(addr, atomic, no_wait, write_fault, writable); } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, + return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, write_fault, writable, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, true, false, true, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 41da467d99c9..40e87b4b4629 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -24,7 +24,7 @@ #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait, bool write_fault, bool *writable); #ifdef CONFIG_HAVE_KVM_PFNCACHE diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index ab519f72f2cd..6a05d6d0fbe9 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -181,7 +181,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) } /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL); + new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL); if (is_error_noslot_pfn(new_pfn)) goto out_error; base-commit: 4284f0063c48fc3734b0bedb023702c4d606732f --