Re: [PATCH 3/4] KVM: x86: correctly merge pending and injected exception

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/04/21 16:38, Maxim Levitsky wrote:
+static int kvm_do_deliver_pending_exception(struct kvm_vcpu *vcpu)
+{
+	int class1, class2, ret;
+
+	/* try to deliver current pending exception as VM exit */
+	if (is_guest_mode(vcpu)) {
+		ret = kvm_x86_ops.nested_ops->deliver_exception_as_vmexit(vcpu);
+		if (ret || !vcpu->arch.pending_exception.valid)
+			return ret;
+	}
+
+	/* No injected exception, so just deliver the payload and inject it */
+	if (!vcpu->arch.injected_exception.valid) {
+		trace_kvm_inj_exception(vcpu->arch.pending_exception.nr,
+					vcpu->arch.pending_exception.has_error_code,
+					vcpu->arch.pending_exception.error_code);
+queue:

If you move the queue label to the top of the function, you can "goto queue" for #DF as well and you don't need to call kvm_do_deliver_pending_exception again.  In fact you can merge this function and kvm_deliver_pending_exception completely:


static int kvm_deliver_pending_exception_as_vmexit(struct kvm_vcpu *vcpu)
{
	WARN_ON(!vcpu->arch.pending_exception.valid);
	if (is_guest_mode(vcpu))
		return kvm_x86_ops.nested_ops->deliver_exception_as_vmexit(vcpu);
	else
		return 0;
}

static int kvm_merge_injected_exception(struct kvm_vcpu *vcpu)
{
	/*
	 * First check if the pending exception takes precedence
	 * over the injected one, which will be reported in the
	 * vmexit info.
	 */
	ret = kvm_deliver_pending_exception_as_vmexit(vcpu);
	if (ret || !vcpu->arch.pending_exception.valid)
		return ret;

	if (vcpu->arch.injected_exception.nr == DF_VECTOR) {
		...
		return 0;
	}
	...
	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
	    || (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
		...
	}
	vcpu->arch.injected_exception.valid = false;
}

static int kvm_deliver_pending_exception(struct kvm_vcpu *vcpu)
{
	if (!vcpu->arch.pending_exception.valid)
		return 0;

	if (vcpu->arch.injected_exception.valid)
		kvm_merge_injected_exception(vcpu);

	ret = kvm_deliver_pending_exception_as_vmexit(vcpu));
	if (ret || !vcpu->arch.pending_exception.valid)
		return ret;

	trace_kvm_inj_exception(vcpu->arch.pending_exception.nr,
				vcpu->arch.pending_exception.has_error_code,
				vcpu->arch.pending_exception.error_code);
	...
}

Note that if the pending exception is a page fault, its payload
must be delivered to CR2 before converting it to a double fault.

Going forward to vmx.c:


 	if (mtf_pending) {
 		if (block_nested_events)
 			return -EBUSY;
+
 		nested_vmx_update_pending_dbg(vcpu);

Should this instead "WARN_ON(vmx_pending_dbg_trap(vcpu));" since
the pending-#DB-plus-MTF combination is handled in
vmx_deliver_exception_as_vmexit?...


+
+	if (vmx->nested.mtf_pending && vmx_pending_dbg_trap(vcpu)) {
+		/*
+		 * A pending monitor trap takes precedence over pending
+		 * debug exception which is 'stashed' into
+		 * 'GUEST_PENDING_DBG_EXCEPTIONS'
+		 */
+
+		nested_vmx_update_pending_dbg(vcpu);
+		vmx->nested.mtf_pending = false;
+		nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
+		return 0;
+	}

... though this is quite ugly, even more so if you add the case of an
INIT with a pending #DB.  The problem is that INIT and MTF have higher
priority than debug exceptions.

The good thing is that an INIT or MTF plus #DB cannot happen with
nested_run_pending == 1, so it will always be inject right away.

There is precedent with KVM_GET_* modifying the CPU state; in
particular, KVM_GET_MPSTATE can modify CS and RIP and even cause a
vmexit via kvm_apic_accept_events.  And in fact, because
kvm_apic_accept_events calls kvm_check_nested_events, calling it
from KVM_GET_VCPU_EVENTS would fix the problem: the injected exception
would go into the IDT-vectored exit field, while the pending exception
would go into GUEST_PENDING_DBG_EXCEPTION and disappear.

However, you cannot do kvm_apic_accept_events twice because there would
be a race with INIT: a #DB exception could be first stored by
KVM_GET_VCPU_EVENTS, then disappear when kvm_apic_accept_events
KVM_GET_MPSTATE is called.

Fortunately, the correct order for KVM_GET_* events is first
KVM_GET_VCPU_EVENTS and then KVM_GET_MPSTATE.  So perhaps
instead of calling kvm_deliver_pending_exception on vmexit,
KVM_GET_VCPU_EVENTS could call kvm_apic_accept_events (and thus
kvm_check_nested_events) instead of KVM_GET_MPSTATE.  In addition:
nested_ops.check_events would be split in two, with high-priority
events (triple fault, now in kvm_check_nested_events; INIT; MTF)
remaining in the first and interrupts in the second, tentatively
named check_interrupts).

I'll take a look at cleaning up the current mess we have around
kvm_apic_accept_events, where most of the calls do not have to
handle guest mode at all.

Thanks for this work and for showing that it's possible to fix the
underlying mess with exception vmexit.  It may be a bit more
complicated than this, but it's a great start.

Paolo




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux