On 04/11/2012 05:11 AM, Peijie Yu wrote: > Hi,all > I have met some problems while utilizing KVM。 > The test environment is: > Summary: Dell R610, 1 x Xeon E5645 2.40GHz, 47.1GB / 48GB 1333MHz DDR3 > System: Dell PowerEdge R610 (Dell 08GXHX) > Processors: 1 (of 2) x Xeon E5645 2.40GHz 5860MHz FSB (HT enabled, > 6 cores, 24 threads) > Memory: 47.1GB / 48GB 1333MHz DDR3 == 12 x 4GB > Disk: sda: 299GB (72%) JBOD > Disk: sdb (host9): 5.0TB JBOD == 1 x VIRTUAL-DISK > Disk: sdc (host11): 5.0TB JBOD == 1 x VIRTUAL-DISK > Disk: sdd (host12): 5.0TB JBOD == 1 x VIRTUAL-DISK > Disk: sde (host10): 5.0TB JBOD == 1 x VIRTUAL-DISK > Disk-Control: mpt2sas0: LSI Logic / Symbios Logic SAS2008 > PCI-Express Fusion-MPT SAS-2 [Falcon] > Disk-Control: host9: > Disk-Control: host10: > Disk-Control: host11: > Disk-Control: host12: > Chipset: Intel 82801IB (ICH9) > Network: br1 (bridge): 14:fe:b5:dc:2c:6e > Network: em1 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, > 14:fe:b5:dc:2c:6e, 1000Mb/s <full-duplex> > Network: em2 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, > 14:fe:b5:dc:2c:70, 1000Mb/s <full-duplex> > Network: em3 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, > 14:fe:b5:dc:2c:72, 1000Mb/s <full-duplex> > Network: em4 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, > 14:fe:b5:dc:2c:74, 1000Mb/s <full-duplex> > Network: vnet0 (tun): fe:16:3e:49:fb:05, 10Mb/s <full-duplex> > Network: vnet1 (tun): fe:16:3e:cb:c0:d1, 10Mb/s <full-duplex> > Network: vnet2 (tun): fe:16:3e:1e:c1:c4, 10Mb/s <full-duplex> > Network: vnet3 (tun): fe:16:3e:d5:58:f4, 10Mb/s <full-duplex> > Network: vnet4 (tun): fe:16:3e:15:b4:16, 10Mb/s <full-duplex> > Network: vnet5 (tun): fe:16:3e:d2:07:47, 10Mb/s <full-duplex> > Network: vnet6 (tun): fe:16:3e:e1:2b:b9, 10Mb/s <full-duplex> > OS: RHEL Server 6.1 (Santiago), Linux > 2.6.32-220.2.1.el6.x86_64 x86_64, 64-bit > BIOS: Dell 3.0.0 01/31/2011 > > And during the term i utilize KVM, some issues happen: > 1. Host Crash Caused by > a. Kernel Panic > 31 KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux > 32 DUMPFILE: ../vmcore_2012.13.46 [PARTIAL DUMP] > 33 CPUS: 24 > 34 DATE: Wed Jan 11 13:34:13 2012 > 35 UPTIME: 25 days, 04:11:05 > 36 LOAD AVERAGE: 223.16, 172.97, 158.23 > 37 TASKS: 1464 > 38 NODENAME: dell2.localdomain > 39 RELEASE: 2.6.32-131.12.1.el6.x86_64 > 40 VERSION: #1 SMP Sun Jul 31 16:44:56 EDT 2011 > 41 MACHINE: x86_64 (2394 Mhz) > 42 MEMORY: 48 GB > 43 PANIC: "kernel BUG at arch/x86/kernel/traps.c:547!" > 44 PID: 11851 > 45 COMMAND: "qemu-kvm" > 46 TASK: ffff880c071c3500 [THREAD_INFO: ffff880c132d8000] > 47 CPU: 1 > 48 STATE: TASK_RUNNING (PANIC) > 49 > 50 PID: 11851 TASK: ffff880c071c3500 CPU: 1 COMMAND: "qemu-kvm" > 51 #0 [ffff880028207be0] machine_kexec at ffffffff810310cb > 52 #1 [ffff880028207c40] crash_kexec at ffffffff810b6392 > 53 #2 [ffff880028207d10] oops_end at ffffffff814de670 > 54 #3 [ffff880028207d40] die at ffffffff8100f2eb > 55 #4 [ffff880028207d70] do_trap at ffffffff814ddf64 > 56 #5 [ffff880028207dd0] do_invalid_op at ffffffff8100ceb5 > 57 #6 [ffff880028207e70] invalid_op at ffffffff8100bf5b > 58 [exception RIP: do_nmi+554] > 59 RIP: ffffffff814de43a RSP: ffff880028207f28 RFLAGS: 00010002 > 60 RAX: ffff880c132d9fd8 RBX: ffff880028207f58 RCX: 00000000c0000101 > 61 RDX: 00000000ffff8800 RSI: ffffffffffffffff RDI: ffff880028207f58 > 62 RBP: ffff880028207f48 R8: ffff88005ebf9800 R9: ffff880028203fc0 > 63 R10: 0000000000000034 R11: 00000000000003e8 R12: 000000000000cc20 > 64 R13: ffffffff816024a0 R14: ffff88005ebf9800 R15: 00007ffffffff000 > 65 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > 66 #7 [ffff880028207f50] nmi at ffffffff814ddc90 > 67 [exception RIP: bad_to_user+37] > 68 RIP: ffffffff814e4e2b RSP: ffff880028207bb0 RFLAGS: 00010046 > 69 RAX: ffff880c132d9fd8 RBX: ffff880c132d9c48 RCX: 0000000000000001 > 70 RDX: 0000000000000000 RSI: 000000010000000b RDI: ffff880028207c08 > 71 RBP: ffff880028207c48 R8: ffff88005ebf9800 R9: ffff880028203fc0 > 72 R10: 0000000000000034 R11: 00000000000003e8 R12: 000000000000cc20 > 73 R13: ffffffff816024a0 R14: ffff88005ebf9800 R15: 00007ffffffff000 > 74 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > 75 --- <NMI exception stack> --- > > For this problem, i found that panic is caused by > BUG_ON(in_nmi()) which means NMI happened during another NMI Context; > But i check the Intel Technical Manual and found "While an NMI > interrupt handler is executing, the processor disables additional > calls to the NMI handler until the next IRET instruction is executed." > So, how this happen? > The NMI path for kvm is different; the processor exits from the guest with NMIs blocked, then executes kvm code until it issues "int $2" in vmx_complete_interrupts(). If an IRET is executed in this path, then NMIs will be unblocked and nested NMIs may occur. One way this can happen is if we access the vmap area and incur a fault, between the VMEXIT and invoking the NMI handler. Or perhaps the NMI handler itself generates a fault. Or we have a debug exception in that path. Is this reproducible? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html