On Sat, 2020-05-23 at 09:09 +0200, Paolo Bonzini wrote: > On 23/05/20 02:40, Maxim Levitsky wrote: > > On Fri, 2020-05-22 at 20:34 +0200, Paolo Bonzini wrote: > > > The #DE is caused by INTERCEPT_VINTR being clear and V_IRQ being set. > > > > > > I'm not really sure how that can happen, I tried to put some code in the > > > branch that would avoid that, and who knows maybe it even made some > > > progress on Hyper-V. But, I might have missed some case. This thing is > > > growing fast... > > I''ll try to see if I can find any clues to this tomorrow. > > One thing for sure, it all started with > > > > 'KVM: nSVM: implement check_nested_events for interrupts' > > b518ba9fa691a3066ee935f6f317f827295453f0 > > Yes, but that is a bugfix. The IRQ tests in svm.flat were all broken > before. > > Paolo > I managed to get to the very bottom of this rabbit hole. I attach the debug traces of host, and debug changes I did to the host and guest. Both host and guest run on mainline git then kvm/queue and kvm/nested-svm merged You can find the exact kernel used on my gitlab account https://gitlab.com/maximlevitsky/linux.git, branches kernel-starship-5.7 and kernel-starship-vm-5.7 accordingly I captured and analyzed trace, together with some help from a debug code I inserted (you can find this in these trees as well, plus I attached it, together with annotated trace on the host) I think that some of these debug hacks can be turned to something useful, like tracing the state of interrupt window, printing the int_cfg, etc. I'll send patches about this soon. Some of the things I guessed, and I will note when I say something I don't have a significant proof of, since I don't know all of KVM yet. With all this said, this is what is happening: 1. The host sets interrupt window. It needs interrupts window because (I guess) that guest disabled interrupts and it waits till interrupts are enabled to inject the interrupt. To be honest this is VMX limitation, and in SVM (I think according to the spec) you can inject interrupts whenever you want and if the guest can't accept then interrupt, the vector written to EVENTINJ field, will be moved to V_INTR_VECTOR and V_INTR flag will be set, untill the guest re-enables interrupts, and then the interrupt will be injected. I don't have a proof of this from SVM spec but it looks like that, so to some extent this is a guess as well. 2. Since SVM doesn't really have a concept of interrupt window intercept, this is faked by setting V_INTR, as if injected (or as they call it virtual) interrupt is pending, together with intercept of virtual interrupts, which happens when the virtual interrupt is about to be injected, which is equavalent of intercepting the moment, the guest re-enabled interrupts. 3. Now we setup our interrupt window while we about to enter a nested guest (while we deal with VMRUN intercept), because we have an interrupt pending injection and the guest is indeed with disabled interrupts (it will re-enable them when it returns to the nested guest). 4. After we enter the nested guest, we eventually get an VMexit due to unrelated reason and we sync the V_INTR that *we* set to the nested vmcs, since in theory that flag could have beeing set by the CPU itself, if the guest itself used EVENTINJ to inject an event to its nested guest while the nested guest didn't have interrupts enabled (KVM doesn't do this, but we can't assume that) 5. Now we enter the guest and it re-enables interrupts as part of its VMexit handling, and we intercept that with INTERCEPT_VINTR, disable VINTR intercept, and inject the interrupt we always wanted to the guest. In other words, our interrupt window is finally signaled, and our abuse of VINTR is ended. 5. At that point the bomb is ticking. Once the guest ends dealing with the nested guest VMexit, and executes VMRUN, we enter the nested guest with V_INTR enabled. V_INTER intercept is disabled since we disabled our interrupt window long ago, guest is also currently doesn't enable any interrupt window, so we basically injecting to the guest whatever is there in V_INTR_VECTOR in the nested guest's VMCB. Usually there is the value that was when the guest itself used interrupt window on its nested guest and that is zero. So we inject interrupt 0 to the nested guest which wasn't there ever. Checkmate. Now that I am thinking about this the issue is deeper that I thought and it stems from the abuse of the V_INTR on AMD. IMHO the best solution is to avoid interrupt windows on AMD unless really needed (like with level-triggered interrupts or so) If to try to explain the root cause of this, what happens is this: 1. The host sets up the interrupt window by setting VINTR intercept and VINTR pending flag 2. Once we exit the nested guest, we are supposed to sync the VINTR pending flag to the nested guest VMCB since this flag could have beeing set by the CPU itself, and not by us abusing it. 3. If the flag is set by use, it ends up in the nested guest VMCB and since the nested guest doesn't nessesarly intercept the VINTR, it might end up injecting an interrupt to the guest. Now the problem is that it is next to impossible to know the source of the VINTR pending flag. Even if we remember that host is currently setup an interrupt window, the guest afterwards could have used EVENTINJ + interrupt disabled nested guest, to raise that flag as well, and might need to know about it. I have an idea on how to fix this, which is about 99% correct and will only fail if the guest attempt something that is undefined anyway. Lets set the vector of the fake VINTR to some reserved exception value, rather that 0 (which the guest is not supposed to inject ever to the nested guest), so then we will know if the VINTR is from our fake injection or from the guest itself. If it is our VINTR then we will not sync it to the guest. In theory it can be even 0, since exceptions should never be injected as interrupts anyway, this is also reserved operation. What do you think? This is the patch that fixes the issue for me that implements the above, just copy&pasted to my email client, so it probably won't apply. I'll send this from git if you think that this is the right idea: diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 134a5cf2b7d3..b8b2238af5e4 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -273,6 +273,13 @@ void sync_nested_vmcb_control(struct vcpu_svm *svm) WARN_ON(1); } + + if (svm->vmcb->control.int_vector == 0x1F) { + /* This is our fake V_IRQ, which is meant for interrupt + * window. No need to sync this to the guest*/ + return; + } + svm->nested.ctl.int_ctl &= ~mask; svm->nested.ctl.int_ctl |= svm->vmcb->control.int_ctl & mask; } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index a12af9467114..90616317575a 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1363,7 +1363,7 @@ static void svm_set_vintr(struct vcpu_svm *svm) * Actual injection of virtual interrupts happens through EVENTINJ. */ control = &svm->vmcb->control; - control->int_vector = 0x0; + control->int_vector = 0x1F; control->int_ctl &= ~V_INTR_PRIO_MASK; control->int_ctl |= V_IRQ_MASK | ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); Best regards, Maxim Levitsky
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b3406379697c8..487f24c478e79 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1363,7 +1363,7 @@ static void svm_set_vintr(struct vcpu_svm *svm) * Actual injection of virtual interrupts happens through EVENTINJ. */ control = &svm->vmcb->control; - control->int_vector = 0x0; + control->int_vector = 0x4; control->int_ctl &= ~V_INTR_PRIO_MASK; control->int_ctl |= V_IRQ_MASK | ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); @@ -2663,6 +2663,8 @@ static int interrupt_window_interception(struct vcpu_svm *svm) kvm_make_request(KVM_REQ_EVENT, &svm->vcpu); svm_clear_vintr(svm); + trace_kvm_enable_window(0); + /* * For AVIC, the only reason to end up here is ExtINTs. * In this case AVIC was temporarily disabled for @@ -3152,7 +3154,11 @@ static void enable_irq_window(struct kvm_vcpu *vcpu) */ svm_toggle_avic_for_irq_window(vcpu, false); svm_set_vintr(svm); + + trace_kvm_enable_window(2); } + + trace_kvm_enable_window(1); } static void enable_nmi_window(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 54a10c98d7466..c69a5f682e2c3 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -29,6 +29,22 @@ TRACE_EVENT(kvm_entry, TP_printk("vcpu %u", __entry->vcpu_id) ); + +TRACE_EVENT(kvm_enable_window, + TP_PROTO(unsigned int enabled), + TP_ARGS(enabled), + + TP_STRUCT__entry( + __field( unsigned int, enabled ) + ), + + TP_fast_assign( + __entry->enabled = enabled; + ), + + TP_printk("enabled %u", __entry->enabled) +); + /* * Tracepoint for hypercall. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8494fd519bda0..14525a0c39423 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10643,6 +10643,7 @@ u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu) EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_enable_window); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 16ef8cfefb5f..134a5cf2b7d3 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -142,6 +142,8 @@ void recalc_intercepts(struct vcpu_svm *svm) c->intercept_dr |= g->intercept_dr; c->intercept_exceptions |= g->intercept_exceptions; c->intercept |= g->intercept; + + //c->intercept |= (1ULL << INTERCEPT_VINTR); } static void copy_vmcb_control_area(struct vmcb_control_area *dst, @@ -268,6 +270,8 @@ void sync_nested_vmcb_control(struct vcpu_svm *svm) * restores int_ctl. We can just leave it aside. */ mask &= ~V_IRQ_MASK; + + WARN_ON(1); } svm->nested.ctl.int_ctl &= ~mask; svm->nested.ctl.int_ctl |= svm->vmcb->control.int_ctl & mask; @@ -393,6 +397,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm) struct vmcb *vmcb = svm->vmcb; struct kvm_host_map map; u64 vmcb_gpa; + bool was_like_that = false; if (is_smm(&svm->vcpu)) { kvm_queue_exception(&svm->vcpu, UD_VECTOR); @@ -412,6 +417,11 @@ int nested_svm_vmrun(struct vcpu_svm *svm) nested_vmcb = map.hva; + if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_VINTR)) && (nested_vmcb->control.int_ctl & V_IRQ_MASK) && nested_vmcb->control.int_vector == 4) { + was_like_that = true; + } + + if (!nested_vmcb_checks(nested_vmcb)) { nested_vmcb->control.exit_code = SVM_EXIT_ERR; nested_vmcb->control.exit_code_hi = 0; @@ -420,6 +430,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm) goto out; } + + + trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa, nested_vmcb->save.rip, nested_vmcb->control.int_ctl, @@ -462,6 +475,35 @@ int nested_svm_vmrun(struct vcpu_svm *svm) svm->nested.nested_run_pending = 1; enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb); + // + // vmcb->control.intercept = host.intercept | guest.intercept + // guest at that stage probably doesn't intercept VINTR, host probably also doesn't + // + // vmcb->control.int_vector comes from svm->nested.ctl.int_vector which comes from nested_vmcb + // probably didn't get updated since last interrupt window in the L1 guest + // + // + // vmcb->control.int_ctl & V_IRQ_MASK + // this seems also to be taken from the guest + // + // + + if(!(vmcb->control.intercept & (1ULL << INTERCEPT_VINTR)) && (vmcb->control.int_ctl & V_IRQ_MASK) && vmcb->control.int_vector == 4) { + + printk(KERN_ERR "vcpu %d IS ABOUT to receive wrong VINTR\n", svm->vcpu.vcpu_id); + + if (was_like_that) { + printk(KERN_ERR "vcpu %d The VINTR settings come from the nested guest itsel\n", svm->vcpu.vcpu_id); + } + + svm->vmcb->control.exit_code = SVM_EXIT_ERR; + svm->vmcb->control.exit_code_hi = 0; + svm->vmcb->control.exit_info_1 = 420; + svm->vmcb->control.exit_info_2 = 0; + nested_svm_vmexit(svm); + return ret; + } + if (!nested_svm_vmrun_msrpm(svm)) { svm->vmcb->control.exit_code = SVM_EXIT_ERR; svm->vmcb->control.exit_code_hi = 0; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b3406379697c..a12af9467114 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1368,9 +1368,11 @@ static void svm_set_vintr(struct vcpu_svm *svm) control->int_ctl |= V_IRQ_MASK | ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); mark_dirty(svm->vmcb, VMCB_INTR); + + trace_kvm_interrupt_window(1); } -static void svm_clear_vintr(struct vcpu_svm *svm) +static void svm_clear_vintr(struct vcpu_svm *svm, int from) { clr_intercept(svm, INTERCEPT_VINTR); @@ -1380,6 +1382,8 @@ static void svm_clear_vintr(struct vcpu_svm *svm) svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; mark_dirty(svm->vmcb, VMCB_INTR); + + trace_kvm_interrupt_window(0); } static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg) @@ -2003,8 +2007,9 @@ void svm_set_gif(struct vcpu_svm *svm, bool value) disable_gif(svm); /* After a CLGI no interrupts should come */ - if (!kvm_vcpu_apicv_active(&svm->vcpu)) - svm_clear_vintr(svm); + if (!kvm_vcpu_apicv_active(&svm->vcpu)) { + svm_clear_vintr(svm, 1); + } } } @@ -2661,7 +2666,7 @@ static int msr_interception(struct vcpu_svm *svm) static int interrupt_window_interception(struct vcpu_svm *svm) { kvm_make_request(KVM_REQ_EVENT, &svm->vcpu); - svm_clear_vintr(svm); + svm_clear_vintr(svm, 2); /* * For AVIC, the only reason to end up here is ExtINTs. @@ -2891,8 +2896,8 @@ static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) { struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control; - *info1 = control->exit_info_1; - *info2 = control->exit_info_2; + *info1 = control->int_ctl; + *info2 = control->int_vector; } static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) @@ -2914,8 +2919,8 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) int vmexit; trace_kvm_nested_vmexit(svm->vmcb->save.rip, exit_code, - svm->vmcb->control.exit_info_1, - svm->vmcb->control.exit_info_2, + svm->vmcb->control.int_ctl, + svm->vmcb->control.int_vector, svm->vmcb->control.exit_int_info, svm->vmcb->control.exit_int_info_err, KVM_ISA_SVM); diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 54a10c98d746..845e2128ee72 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -11,6 +11,22 @@ #undef TRACE_SYSTEM #define TRACE_SYSTEM kvm +TRACE_EVENT(kvm_interrupt_window, + TP_PROTO(unsigned int value), + TP_ARGS(value), + + TP_STRUCT__entry( + __field( unsigned int, value ) + ), + + TP_fast_assign( + __entry->value = value; + ), + + TP_printk("value %u", __entry->value) +); + + /* * Tracepoint for guest mode entry. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4b2855ab46d4..40b8828a5e7b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8226,6 +8226,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); + + WARN_ON(req_int_win); + fastpath_t exit_fastpath; bool req_immediate_exit = false; @@ -10660,3 +10663,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_interrupt_window);
Here we enter a nested guest. We setup the interrupt window for a pending injection of interrupt 49 <...>-602596 [016] 96904.010990: kvm_exit: reason EXIT_VMRUN rip 0xffffffffa04f722a info 30f0001 0 <...>-602596 [016] 96904.010991: kvm_nested_vmrun: rip: 0xffffffffa04f722a vmcb: 0x000000081a6d2000 nrip: 0x000000007fb5d347 int_ctl: 0x010f0000 event_inj: 0x00000000 npt: on <...>-602596 [016] 96904.010991: kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercept: 00006e7fbd4c8027 <...>-602537 [048] 96904.010992: kvm_msi_set_irq: dst 0 vec 49 (Fixed|physical|edge) <...>-602537 [048] 96904.010993: kvm_apic_accept_irq: apicid 0 vec 49 (Fixed|edge) <...>-602596 [016] 96904.010994: kvm_interrupt_window: value 1 <...>-602596 [016] 96904.010995: kvm_entry: vcpu 0 ===================================================================================================================================================================================== * We exit due to a real interrupt, vector not known yet I think * VINTR is still intercepted, since interrupt window is on,we toggle it on/off but keep it on the entry. * We forward the vmexit to the L1 guest and exit the nesting. <!!!!!!!!!!!!here, slightly before the kvm_exit, at the end of svm_vcpu_run, we sync the current int_ctl to svm->nested.ctl, which pollutes it with VINTR!!!!!!!> sync_nested_vmcb_control() <...>-602596 [016] 96904.010997: kvm_exit: reason EXIT_INTR rip 0x7fb5d347 info 10f0100 0 <!!!! and here that bit propogates to nested VMCB !!!!> nested_vmcb->control.int_ctl = svm->nested.ctl.int_ctl; <...>-602596 [016] 96904.010997: kvm_nested_vmexit: rip 7fb5d347 reason EXIT_INTR info1 10f0100 info2 0 int_info 0 int_info_err 0 <...>-602596 [016] 96904.010998: kvm_nested_intr_vmexit: rip: 0x000000007fb5d347 <----- guest mode ended ------> <...>-602596 [016] 96904.010998: kvm_interrupt_window: value 0 # this is most likely the result of svm_set_gif(false), because on vmexit gif is set to 0 <...>-602596 [016] 96904.010999: kvm_nested_vmexit_inject: reason EXIT_INTR info1 0 info2 0 int_info 0 int_info_err 0 <...>-602596 [016] 96904.011000: kvm_interrupt_window: value 1 # now interrupt window is setup again since we still have that pending injection <...>-602596 [016] 96904.011000: kvm_entry: vcpu 0 Here L1 apparently re-enables interrupts to get hit with the real interrupt it got. Most likeky it does stgi But since we catch VINTR (we have interrupt window ON), we inject it the interrupt that interrupt 49 now <...>-602596 [016] 96904.011003: kvm_exit: reason EXIT_VINTR rip 0xffffffffa04224b0 info 30f0301 0 <...>-602596 [016] 96904.011003: kvm_interrupt_window: value 0 <...>-602596 [016] 96904.011004: kvm_inj_virq: irq 49 <...>-602596 [016] 96904.011004: kvm_entry: vcpu 0 Here the L1 guest ends dealing with the injected interrupt, and prepares to enter the nested guest <...>-602596 [016] 96904.011014: kvm_eoi: apicid 0 vector 49 <...>-602596 [016] 96904.011015: kvm_pv_eoi: apicid 0 vector 49 <...>-602596 [016] 96904.011015: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f3d7d info 30f0201 0 <...>-602596 [016] 96904.011015: kvm_msr: msr_write 175 = 0xfffffe0000002200 <...>-602596 [016] 96904.011016: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011017: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f3d7d info 30f0201 0 <...>-602596 [016] 96904.011017: kvm_msr: msr_write 176 = 0xffffffff81a012d0 <...>-602596 [016] 96904.011017: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011018: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f3d7d info 30f0201 0 <...>-602596 [016] 96904.011018: kvm_msr: msr_write c0000103 = 0x0 <...>-602596 [016] 96904.011019: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011019: kvm_exit: reason EXIT_WRITE_DR6 rip 0xffffffffa041c21e info 30f0201 0 <...>-602596 [016] 96904.011020: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011023: kvm_exit: reason EXIT_NPF rip 0xffffffff814cea99 info 30f0201 0 <...>-602596 [016] 96904.011023: kvm_page_fault: address 1000003008 error_code f <...>-602596 [016] 96904.011024: kvm_emulate_insn: 0:ffffffff814cea99: 66 89 3e <...>-602596 [016] 96904.011025: vcpu_match_mmio: gva 0xffffc9000024d008 gpa 0x1000003008 Write GPA <...>-602596 [016] 96904.011026: kvm_mmio: mmio write len 2 gpa 0x1000003008 val 0x2 <...>-602596 [016] 96904.011028: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011031: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f4818 info 30f0201 0 <...>-602596 [016] 96904.011031: kvm_msr: msr_read 175 = 0xfffffe0000002200 <...>-602596 [016] 96904.011032: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011033: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f4818 info 30f0201 0 <...>-602596 [016] 96904.011033: kvm_msr: msr_read 176 = 0xffffffff81a012d0 <...>-602596 [016] 96904.011033: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011034: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f4818 info 30f0201 0 <...>-602596 [016] 96904.011035: kvm_msr: msr_read c0000103 = 0x0 <...>-602596 [016] 96904.011035: kvm_entry: vcpu 0 <...>-602596 [016] 96904.011036: kvm_exit: reason EXIT_MSR rip 0xffffffffa04f4870 info 30f0201 0 <...>-602596 [016] 96904.011036: kvm_msr: msr_write c0000103 = 0x0 <...>-602596 [016] 96904.011037: kvm_entry: vcpu 0 ===================================================================================================================================================================================== Here the L1 guest tries to enter the nested guest again. <...>-602596 [016] 96904.011039: kvm_exit: reason EXIT_VMRUN rip 0xffffffffa04f722a info 30f0001 0 <...>-602596 [016] 96904.011040: kvm_nested_vmrun: rip: 0xffffffffa04f722a vmcb: 0x000000081a6d2000 nrip: 0x000000007fb5d347 int_ctl: 0x010f0100 event_inj: 0x00000000 npt: on <...>-602596 [016] 96904.011040: kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercept: 00006e7fbd4c8027 <...>-602537 [048] 96904.011061: kvm_msi_set_irq: dst 0 vec 49 (Fixed|physical|edge) <...>-602537 [048] 96904.011062: kvm_apic_accept_irq: apicid 0 vec 49 (Fixed|edge) <...>-602596 [016] 96904.012588: kvm_interrupt_window: value 0 <!!! this is the error I planted when error condition is about to happen !!!> <...>-602596 [016] 96904.012589: kvm_nested_vmexit_inject: reason EXIT_ERR info1 1a4 info2 0 int_info 0 int_info_err 0