On Mon, Mar 30, 2015 at 9:56 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > 2015-03-27 13:16+0300, Andrey Korolyov: >> On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das <bsd@xxxxxxxxxx> wrote: >> > Radim Krčmář <rkrcmar@xxxxxxxxxx> writes: >> >> I second Bandan -- checking that it reproduces on other machine would be >> >> great for sanity :) (Although a bug in our APICv is far more likely.) >> > >> > If it's APICv related, a run without apicv enabled could give more hints. >> > >> > Your "devices not getting reset" hypothesis makes the most sense to me, >> > maybe the timer vector in the error message is just one part of >> > the whole story. Another misbehaving interrupt from the dark comes in at the >> > same time and leads to a double fault. >> >> Default trace (APICv enabled, first reboot introduced the issue): >> http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz > > The relevant part is here, > prefixed with "qemu-system-x86-4180 [002] 697.111550:" > > kvm_exit: reason CR_ACCESS rip 0xd272 info 0 0 > kvm_cr: cr_write 0 = 0x10 > kvm_mmu_get_page: existing sp gfn 0 0/4 q0 direct --- !pge !nxe root 0 sync > kvm_entry: vcpu 0 > kvm_emulate_insn: f0000:d275: ea 7a d2 00 f0 > kvm_emulate_insn: f0000:d27a: 2e 0f 01 1e f0 6c > kvm_emulate_insn: f0000:d280: 31 c0 > kvm_emulate_insn: f0000:d282: 8e e0 > kvm_emulate_insn: f0000:d284: 8e e8 > kvm_emulate_insn: f0000:d286: 8e c0 > kvm_emulate_insn: f0000:d288: 8e d8 > kvm_emulate_insn: f0000:d28a: 8e d0 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xd28f info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0x8dd0 info 184 0 > kvm_page_fault: address f8dd0 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8dd0 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0x76d6 info 184 0 > kvm_page_fault: address f76d6 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0x76d6 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason PENDING_INTERRUPT rip 0xd331 info 0 0 > kvm_inj_virq: irq 8 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0xfea5 info 184 0 > kvm_page_fault: address ffea5 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfea5 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EPT_VIOLATION rip 0xe990 info 184 0 > kvm_page_fault: address fe990 error_code 184 > kvm_entry: vcpu 0 > kvm_exit: reason EXTERNAL_INTERRUPT rip 0xe990 info 0 800000f6 > kvm_entry: vcpu 0 > kvm_exit: reason EXCEPTION_NMI rip 0xd334 info 0 80000b0d > kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) > >> Trace without APICv (three reboots, just to make sure to hit the >> problematic condition of supposed DF, as it still have not one hundred >> percent reproducibility): >> http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz > > The trace here contains a well matching excerpt, just instead of the > EXCEPTION_NMI, it does > > 169.905098: kvm_exit: reason EPT_VIOLATION rip 0xd334 info 181 0 > 169.905102: kvm_page_fault: address feffd066 error_code 181 > > and works. Page fault says we tried to read 0xfeffd066 -- probably IOPB > of TSS. (I guess it is pre-fetch for following IO instruction.) > > Nothing strikes me when looking at it, but some APICv boots don't fail, > so it would be interesting to compare them ... hosts's 0xf6 interrupt > (IRQ_WORK_VECTOR) is a possible source of races. (We could look more > closely. It is fired too often for my liking as well.) Thanks Radim, http://xdel.ru/downloads/kvm-e5v2-issue/no-fail-with-apicv.dat.gz (missed right button in mailer previously) The related bits looks the same as with enable_apicv=0 for me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html