On Mon, 2021-11-15 at 19:50 +0100, Paolo Bonzini wrote: > On 11/15/21 17:47, David Woodhouse wrote: > > I need to move it *back* to > > invalidate_range_start() where I had it before, if I want to let it > > wait for vCPUs to exit. Which means... that the cache 'refresh' call > > must wait until the mmu_notifier_count reaches zero? Am I allowed to do > > that, and make the "There can be only one waiter" comment in > > kvm_mmu_notifier_invalidate_range_end() no longer true? > > You can also update the cache while taking the mmu_lock for read, and > retry if mmu_notifier_retry_hva tells you to do so. Looking at the > scenario from commit e649b3f0188 you would have: > > (Invalidator) kvm_mmu_notifier_invalidate_range_start() > (Invalidator) write_lock(mmu_lock) > (Invalidator) increment mmu_notifier_count > (Invalidator) write_unlock(mmu_lock) > (Invalidator) request KVM_REQ_APIC_PAGE_RELOAD > (KVM VCPU) vcpu_enter_guest() > (KVM VCPU) kvm_vcpu_reload_apic_access_page() > + (KVM VCPU) read_lock(mmu_lock) > + (KVM VCPU) mmu_notifier_retry_hva() > + (KVM VCPU) read_unlock(mmu_lock) > + (KVM VCPU) retry! (mmu_notifier_count>1) > (Invalidator) actually unmap page > + (Invalidator) kvm_mmu_notifier_invalidate_range_end() > + (Invalidator) write_lock(mmu_lock) > + (Invalidator) decrement mmu_notifier_count > + (Invalidator) write_unlock(mmu_lock) > + (KVM VCPU) vcpu_enter_guest() > + (KVM VCPU) kvm_vcpu_reload_apic_access_page() > + (KVM VCPU) mmu_notifier_retry_hva() > > Changing mn_memslots_update_rcuwait to a waitq (and renaming it to > mn_invalidate_waitq) is of course also a possibility. I do think I'll go for a waitq but let's start *really* simple to make sure I've got the basics right.... does this look vaguely sensible? It returns -EAGAIN and lets the caller retry; I started with a 'goto' but didn't have a sane exit condition. In fact, I *still* don't have a sane exit condition for callers like evtchn_set_fn(). I'm actually tempted to split the caches into two lists (kvm->guest_gpc_list, kvm->atomic_gpc_list) and invalidate only the *former* from invalidate_range_start(), with these -EAGAIN semantics. The atomic ones can stay precisely as they were in the series I already sent since there's no need for them ever to have to spin/wait as long as they're invalidated in the invalidate_range() MMU notifier. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d1187b051203..2d76c09e460c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -151,6 +151,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_UNBLOCK 2 #define KVM_REQ_UNHALT 3 #define KVM_REQ_VM_DEAD (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_GPC_INVALIDATE (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQUEST_ARCH_BASE 8 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7382aa45d5e8..9bc3162ba650 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -468,8 +468,6 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, struct kvm *kvm = mmu_notifier_to_kvm(mn); int idx; - gfn_to_pfn_cache_invalidate(kvm, start, end, false); - idx = srcu_read_lock(&kvm->srcu); kvm_arch_mmu_notifier_invalidate_range(kvm, start, end); srcu_read_unlock(&kvm->srcu, idx); @@ -689,6 +687,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, kvm->mn_active_invalidate_count++; spin_unlock(&kvm->mn_invalidate_lock); + gfn_to_pfn_cache_invalidate(kvm, range->start, range->end, hva_range.may_block); + __kvm_handle_hva_range(kvm, &hva_range); return 0; @@ -2674,7 +2674,6 @@ static void gfn_to_pfn_cache_invalidate(struct kvm *kvm, unsigned long start, } spin_unlock(&kvm->gpc_lock); -#if 0 unsigned int req = KVM_REQ_GPC_INVALIDATE; /* @@ -2690,7 +2689,6 @@ static void gfn_to_pfn_cache_invalidate(struct kvm *kvm, unsigned long start, } else if (wake_vcpus) { called = kvm_make_vcpus_request_mask(kvm, req, vcpu_bitmap); } -#endif WARN_ON_ONCE(called && !may_block); } @@ -2767,6 +2765,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, if (!old_valid || old_uhva != gpc->uhva) { unsigned long uhva = gpc->uhva; void *new_khva = NULL; + unsigned long mmu_seq; + int retry; /* Placeholders for "hva is valid but not yet mapped" */ gpc->pfn = KVM_PFN_ERR_FAULT; @@ -2775,10 +2775,20 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, write_unlock_irq(&gpc->lock); + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); + new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); if (is_error_noslot_pfn(new_pfn)) ret = -EFAULT; - else if (gpc->kernel_map) { + else { + read_lock(&kvm->mmu_lock); + retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); + read_unlock(&kvm->mmu_lock); + if (retry) + ret = -EAGAIN; // or goto the mmu_seq setting bit to retry? + } + if (!ret && gpc->kernel_map) { if (new_pfn == old_pfn) { new_khva = (void *)((unsigned long)old_khva - page_offset); old_pfn = KVM_PFN_ERR_FAULT;
Attachment:
smime.p7s
Description: S/MIME cryptographic signature