Re: [PATCH v3] KVM: x86: Fix recording of guest steal time / preempted status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/12/21 12:29, David Woodhouse wrote:
We *could* use the rwlock thing for steal time reporting, but I still
don't really think it's worth doing so. Again, if it was truly going to
be a generic mechanism that would solve lots of other problems, I'd be
up for it. But if steal time would be the *only* other user of a
generic version of the rwlock thing, that just seems like
overengineering. I'm still mostly inclined to stand by my original
observation that it has a perfectly serviceable HVA that it can use
instead.

Well yeah, I'd have to see the code to decide.  But maybe this is where
we disagree, what I want from generic KVM is a nice and easy-to-use API.
I only care to a certain extent how messy it is inside, because a nice
generic API means not reinventing the wheel across architectures.

That said, I think that your patch is much more complicated than it
should be, because it hooks in the wrong place.  There are two cases:

1) for kmap/kunmap, the invalidate_range() notifier is the right place
to remove any references taken after invalidate_range_start().  For the
APIC access page it needs a kvm_make_all_cpus_request, but for the
shinfo page it can be a simple back-to-back write_lock/write_unlock
(a super-poor-man RCU, if you wish).  And it can be extended to a
per-cache rwlock.

2) for memremap/memunmap, all you really care about is reacting to
changes in the memslots, so the MMU notifier integration has nothing
to do.  You still need to call the same hook as
kvm_mmu_notifier_invalidate_range() when memslots change, so that
the update is done outside atomic context.

So as far as short-term uses of the cache are concerned, all it
takes (if I'm right...) is a list_for_each_entry in
kvm_mmu_notifier_invalidate_range, visiting the list of
gfn-to-pfn caches and lock-unlock each cache's rwlock.  Plus
a way to inform the code of memslot changes before any atomic
code tries to use an invalid cache.

It looks like they want their own way of handling it; if the GPA being
invalidated matches posted_intr_desc_addr or virtual_apic_page_addr
then the MMU notifier just needs to call kvm_make_all_cpus_request()
with some suitable checking/WARN magic around the "we will never need
to sleep when we shouldn't" assertion that you made above.

I was thinking of an extra flag to decide whether (in addition
to the write_lock/write_unlock) the MMU notifier also needs to do
the kvm_make_all_cpus_request:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b213ca966d41..f134db24b973 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -309,8 +309,8 @@ static void free_nested(struct kvm_vcpu *vcpu)
 		kvm_release_page_clean(vmx->nested.apic_access_page);
 		vmx->nested.apic_access_page = NULL;
 	}
-	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+	vmx->nested.virtual_apic_map.guest_uses_pa = false;
+	vmx->nested.pi_desc_map.guest_uses_pa = false;
 	vmx->nested.pi_desc = NULL;
kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
@@ -3183,6 +3184,10 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 			 */
 			vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, INVALID_GPA);
 		}
+		// on invalidation, this causes kvm_make_all_cpus_request
+		// and also dirties the page
+		map->guest_uses_pa = true;
+		kvm_vcpu_unmap(vcpu, map);
 	}
if (nested_cpu_has_posted_intr(vmcs12)) {
@@ -3204,6 +3207,8 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 			vmx->nested.pi_desc = NULL;
 			pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
 		}
+		map->guest_uses_pa = true;
+		kvm_vcpu_unmap(vcpu, map);
 	}
 	if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
 		exec_controls_setbit(vmx, CPU_BASED_USE_MSR_BITMAPS);
@@ -4559,8 +4564,8 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 		kvm_release_page_clean(vmx->nested.apic_access_page);
 		vmx->nested.apic_access_page = NULL;
 	}
-	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+	vmx->nested.virtual_apic_map.guest_uses_pa = false;
+	vmx->nested.pi_desc_map.guest_uses_pa = false;
 	vmx->nested.pi_desc = NULL;
if (vmx->nested.reload_vmcs01_apic_access_page) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3b09ac93c86e..342f12321df7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6185,6 +6185,8 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
/* Defer reload until vmcs01 is the current VMCS. */
 	if (is_guest_mode(vcpu)) {
+		// TODO...
+		nested_vmx_update_vmcs02_phys_addrs(vcpu);
 		to_vmx(vcpu)->nested.reload_vmcs01_apic_access_page = true;
 		return;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91f723f37b22..6d0b7d2f1465 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9670,9 +9670,6 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
-	if (!lapic_in_kernel(vcpu))
-		return;
-
 	if (!kvm_x86_ops.set_apic_access_page_addr)
 		return;
With this infrastructure the APIC access page can be changed to a
gfn_to_pfn cache along the same lines (and whose guest_uses_pa is
always true).

Paolo




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux