On Tue, Jul 19, 2011 at 10:04 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 07/19/2011 12:51 AM, Stefan Hajnoczi wrote: >> >> On Mon, Jul 18, 2011 at 9:18 PM, Berg, Johannes<johannes.berg@xxxxxxxxx> >> wrote: >> >> Today I encountered a "BUG: scheduling while atomic" from kvm.ko when >> >> resuming the host from suspend-to-RAM. I captured as much of the oops >> >> as >> >> was displayed on screen: >> >> >> >> http://vmsplice.net/~stefan/panic1.jpg >> >> http://vmsplice.net/~stefan/panic2.jpg >> >> >> >> It looks like the iwlagn driver may have crashed in an interrupt >> >> handler and the >> >> kvm-related panic was triggered in the aftermath. Any ideas? >> > >> > This doesn't look like iwlagn is involved at all -- the fact that it >> > comes up in the backtrace seems to be an artifact of backtracing not being >> > perfect. The RIP points to kvm_arch_vcpu_ioctl_run+0x927 and there's no >> > reason to believe that iwlagn should crash kvm. >> >> RIP seems to be arch/x86/kvm/x86.c:vcpu_enter_guest(): >> >> preempt_disable(); >> >> kvm_x86_ops->prepare_guest_switch(vcpu); >> if (vcpu->fpu_active) >> kvm_load_guest_fpu(vcpu); >> kvm_load_guest_xcr0(vcpu); >> >> vcpu->mode = IN_GUEST_MODE; >> >> /* We should set ->mode before check ->requests, >> * see the comment in make_all_cpus_request. >> */ >> smp_mb(); >> >> local_irq_disable(); >> >> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests >> || need_resched() || signal_pending(current)) { >> vcpu->mode = OUTSIDE_GUEST_MODE; >> smp_wmb(); >> local_irq_enable(); >> preempt_enable(); >> kvm_x86_ops->cancel_injection(vcpu); >> r = 1; >> goto out; >> } >> >> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); >> >> kvm_guest_enter(); >> >> if (unlikely(vcpu->arch.switch_db_regs)) { >> set_debugreg(0, 7); >> set_debugreg(vcpu->arch.eff_db[0], 0); >> set_debugreg(vcpu->arch.eff_db[1], 1); >> set_debugreg(vcpu->arch.eff_db[2], 2); >> set_debugreg(vcpu->arch.eff_db[3], 3); >> } >> >> trace_kvm_entry(vcpu->vcpu_id); >> kvm_x86_ops->run(vcpu); >> >> /* >> * If the guest has used debug registers, at least dr7 >> * will be disabled while returning to the host. >> * If we don't have active breakpoints in the host, we don't >> * care about the messed up debug address registers. But if >> * we have some of them active, restore the old state. >> */ >> if (hw_breakpoint_active()) >> hw_breakpoint_restore(); >> >> kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc); >> >> vcpu->mode = OUTSIDE_GUEST_MODE; >> smp_wmb(); >> local_irq_enable(); /*<--- boom! */ > > Preemption is still disabled at this point. Where does the "scheduling > while atomic" come from? Nothing in this area attempts to schedule. > > The preemption counter is 0x10000100, indicating zero preempt depth (wrong > for this point, should be 1), and 1 softirq depth (doesn't make much sense). > Looks very wrong, like the preempt mixup that occured on some archs that > are not x86_64 recently. > > Can you post some disassembly around %rip? Here is the kvm.ko disassembly for %rip = kvm_arch_vcpu_ioctl_run+0x927. I think at runtime the pvops bit should be patched and we should just have a sti instruction? I don't understand the backtrace. It prints out the interrupt handler stack frames but %rip appears to be down in kvm.ko. If there is an interrupt handler running shouldn't %rip be in there? Unfortunately the full oops scrolled off the screen so I only have panic1.jpg and panic2.jpg. We're missing earlier information. Is it possible that there was another panic first and that kvm.ko is just dying in the aftermath? 11436: 44 8b 6b 24 mov 0x24(%rbx),%r13d 1143a: 74 23 je 1145f <kvm_arch_vcpu_ioctl_run+0x8dd> 1143c: 4c 8b 25 00 00 00 00 mov 0x0(%rip),%r12 # 11443 <kvm_arch_vcpu_ioctl_run+0x8c1> 1143f: R_X86_64_PC32 __tracepoint_kvm_entry+0x1c 11443: 4d 85 e4 test %r12,%r12 11446: 74 17 je 1145f <kvm_arch_vcpu_ioctl_run+0x8dd> 11448: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 1144d: 44 89 ee mov %r13d,%esi 11450: 41 ff 14 24 callq *(%r12) 11454: 49 83 c4 10 add $0x10,%r12 11458: 49 83 3c 24 00 cmpq $0x0,(%r12) 1145d: eb e7 jmp 11446 <kvm_arch_vcpu_ioctl_run+0x8c4> 1145f: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 11466 <kvm_arch_vcpu_ioctl_run+0x8e4> 11462: R_X86_64_PC32 kvm_x86_ops+0xfffffffffffffffc 11466: 48 89 df mov %rbx,%rdi 11469: ff 90 48 01 00 00 callq *0x148(%rax) 1146f: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 11476: 00 00 11474: R_X86_64_32S cpu_dr7 11478: a8 aa test $0xaa,%al 1147a: 74 05 je 11481 <kvm_arch_vcpu_ioctl_run+0x8ff> 1147c: e8 00 00 00 00 callq 11481 <kvm_arch_vcpu_ioctl_run+0x8ff> 1147d: R_X86_64_PC32 hw_breakpoint_restore+0xfffffffffffffffc 11481: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 11488 <kvm_arch_vcpu_ioctl_run+0x906> 11484: R_X86_64_PC32 kvm_x86_ops+0xfffffffffffffffc 11488: 48 8b 54 24 50 mov 0x50(%rsp),%rdx 1148d: be 10 00 00 00 mov $0x10,%esi 11492: 48 89 df mov %rbx,%rdi 11495: ff 90 80 00 00 00 callq *0x80(%rax) 1149b: c7 43 2c 00 00 00 00 movl $0x0,0x2c(%rbx) 114a2: ff 14 25 00 00 00 00 callq *0x0 114a5: R_X86_64_32S pv_irq_ops+0x18 114a9: ff 83 b8 00 00 00 incl 0xb8(%rbx) <--- boom! 114af: 48 8b 44 24 40 mov 0x40(%rsp),%rax 114b4: 83 60 14 ef andl $0xffffffef,0x14(%rax) 114b8: 48 8b 3b mov (%rbx),%rdi 114bb: 48 83 c7 38 add $0x38,%rdi 114bf: e8 00 00 00 00 callq 114c4 <kvm_arch_vcpu_ioctl_run+0x942> 114c0: R_X86_64_PC32 __srcu_read_lock+0xfffffffffffffffc 114c4: 89 43 28 mov %eax,0x28(%rbx) 114c7: 83 3d 00 00 00 00 04 cmpl $0x4,0x0(%rip) # 114ce <kvm_arch_vcpu_ioctl_run+0x94c> 114c9: R_X86_64_PC32 prof_on+0xfffffffffffffffb 114ce: 75 28 jne 114f8 <kvm_arch_vcpu_ioctl_run+0x976> 114d0: be 10 00 00 00 mov $0x10,%esi 114d5: 48 89 df mov %rbx,%rdi 114d8: e8 46 bb ff ff callq d023 <kvm_register_read> 114dd: 83 3d 00 00 00 00 04 cmpl $0x4,0x0(%rip) # 114e4 <kvm_arch_vcpu_ioctl_run+0x962> 114df: R_X86_64_PC32 prof_on+0xfffffffffffffffb 114e4: 75 12 jne 114f8 <kvm_arch_vcpu_ioctl_run+0x976> For comparison here are the disassembled "Code" bytes from the panic message. I think the photo truncates the full output so the last instruction is incomplete, but it matches up with 0x11488 above: 0: 99 cltd 1: b2 e0 mov $0xe0,%dl 3: 48 8b 05 f8 62 02 00 mov 0x262f8(%rip),%rax # 0x26302 a: 48 8b 54 24 50 mov 0x50(%rsp),%rdx f: be 10 00 00 00 mov $0x10,%esi 14: 48 89 df mov %rbx,%rdi 17: ff .byte 0xff 18: 90 nop 19: 80 .byte 0x80 Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html