On Thu, Feb 10, 2011 at 16:23, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote: > On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote: >> On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@xxxxxxxxxx> wrote: >>> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote: >>>> >>>> > ÂWhen you say "suddenly", this was with no changes to software and >>>> > hardware? >>>> >>>> The host software and hardware hasn't changed in the two months since >>>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13. >>>> >>>> We host customer vms on it though, so virtual machines come and go. >>>> Various operating systems, a mixture of Linux, FreeBSD and Windows >>>> 2008 R2. We have other machines with the same config without these >>>> problems though. >>> >>> Are those other machines running a similar workload? >> >> Yes, similar, or they're more heavily loaded. >> >> On this machine, about half of the 48GB memory was used for virtual machines. >> >>> The traces look awfully like bad hardware, though that can also be explained >>> by random memory corruption due to a bug. >> >> Yeah, that's what I'm expecting. We already replaced the memory, next >> step is to move the disks over to another server to make sure it's not >> the board or cpu's. >> >>>> This time I have a few different messages though: >>>> >>>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault: >>>> 0000 [#1] SMP >>>> >>>> RSI: 0000000000000000 RDI: 1603a07305001568 >>>> >>>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46 >>>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d >>>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0> Âff 4f 08 0f 94 c0 84 >>>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb >>> >>> lock decl 0x8(%rdi) >>> >>> %rdi is completely crap, looks like corruption again. ÂStrangely, it is >>> similar to the bad spte from the previous trace: 0x1603a0730500d277. ÂThe >>> upper 48 bits are identical, the lower 16 bits are different.: >>>> >>>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted >>>> page table at address 7f37b37ff000 >>>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD >>>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067 >>> >>> Here are those magic 48 bits again, in the PTE entry. >>>> >>>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration. >>>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038 >>>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel: >>>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4 >>>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel: >>>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3 >>>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel: >>>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2 >>>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel: >>>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1 >>> >>> Again. >>> >>>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in >>>> process qemu-kvm Âpte:1603a0730500d067 pmd:61059f067 >>> >>> Again. >>> >>> However, these all came from a single boot, yes? >> >> Correct. >> >>> If so they can be the same >>> corruption. ÂPlease collect more traces, with reboots in between. > > This machine has been running for a week without problems, but then we > started to get the following oopses again: > > 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle > kernel paging request at ffffea71929180e0 > 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP: > [<ffffffff81034880>] gup_pte_range+0x94/0xd3 > 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0 > 2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP > 2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file: > /sys/devices/system/cpu/cpu15/topology/thread_siblings > 2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4 > 2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun > ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding > xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter > ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb > iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded: > scsi_wait_scan] > 2011-02-06T19:45:35.222231+01:00 phy005 kernel: > 2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm: > qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU > 2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP: > 0010:[<ffffffff81034880>] Â[<ffffffff81034880>] > gup_pte_range+0x94/0xd3 > 2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP: > 0018:ffff88060b9bda78 ÂEFLAGS: 00010082 > 2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0 > RBX: 00003ffffffff000 RCX: 0000000000000005 > 2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000 > RSI: 00007fe54e3ff000 RDI: 1603a07305004067 > 2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98 > R08: ffff880b94384560 R09: ffff88060b9bdb44 > 2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8 > R11: ffffea0000000000 R12: 0000000000000205 > 2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff > R14: 0000000000000005 R15: 0000000000000000 > 2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS: > 00007fe64cb0e700(0000) GS:ffff880655400000(0000) > knlGS:0000000000000000 > 2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS: Â0010 DS: 002b ES: > 002b CR0: 0000000080050033 > 2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0 > CR3: 0000000bff06d000 CR4: 00000000000026e0 > 2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid: > 3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0) > 2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack: > 2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000 > 00007fe54e400000 00007fe54e400000 ffff88053a0d2388 > 2011-02-06T19:45:35.222285+01:00 phy005 kernel: <0> ffff88060b9bdaf8 > ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff > 2011-02-06T19:45:35.222289+01:00 phy005 kernel: <0> ffff88060b9bdb44 > ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8 > 2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace: > 2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>] > gup_pud_range+0x156/0x192 > 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>] > get_user_pages_fast+0xc4/0x172 > 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ? > bio_add_page+0x36/0x38 > 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>] > dio_get_page+0x54/0x127 > 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>] > __blockdev_direct_IO+0x41d/0xa36 > 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ? > x86_emulate_insn+0x1ff8/0x2d61 [kvm] > 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>] > blkdev_direct_IO+0x4e/0x50 > 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ? > blkdev_get_blocks+0x0/0x8d > 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>] > generic_file_direct_write+0xed/0x16d > 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>] > __generic_file_aio_write+0x196/0x281 > 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ? > file_has_perm+0xa4/0xc6 > 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ? > blkdev_aio_write+0x0/0x69 > 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>] > blkdev_aio_write+0x2a/0x69 > 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ? > blkdev_aio_write+0x0/0x69 > 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>] > aio_rw_vect_retry+0x85/0x18e > 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>] > aio_run_iocb+0x77/0x10f > 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>] > do_io_submit+0x558/0x7ce > 2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>] > sys_io_submit+0x10/0x12 > 2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>] > system_call_fastpath+0x16/0x1b > 2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2 > 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40 > 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79 > 04 48 8b 78 10 f0 ff 47 08 49 63 39 48 > 2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP > [<ffffffff81034880>] gup_pte_range+0x94/0xd3 > 2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP <ffff88060b9bda78> > 2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0 > 2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace > beed2b54d0bb8a00 ]--- > > and > > 2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted > page table at address 7fbde15ff64c > 2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD > 612668067 PMD 5937b7067 PTE 1603a07305008067 > 2011-02-06T19:47:15.023214+01:00 phy005 kernel: Bad pagetable: 000d [#2] SMP > 2011-02-06T19:47:15.023219+01:00 phy005 kernel: last sysfs file: > /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host0/scsi_host/host0/stats > 2011-02-06T19:47:15.023226+01:00 phy005 kernel: CPU 13 > 2011-02-06T19:47:15.023232+01:00 phy005 kernel: Modules linked in: tun > ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding > xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter > ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb > iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded: > scsi_wait_scan] > 2011-02-06T19:47:15.023236+01:00 phy005 kernel: > 2011-02-06T19:47:15.023239+01:00 phy005 kernel: Pid: 3387, comm: > qemu-kvm Tainted: G Â Â ÂD Â Â2.6.34.7-66.tilaa.fc13.x86_64 #1 > X8DTU/X8DTU > 2011-02-06T19:47:15.023244+01:00 phy005 kernel: RIP: > 0033:[<00000000004abb73>] Â[<00000000004abb73>] 0x4abb73 > 2011-02-06T19:47:15.023247+01:00 phy005 kernel: RSP: > 002b:00007fbdf3c00680 ÂEFLAGS: 00010206 > 2011-02-06T19:47:15.023251+01:00 phy005 kernel: RAX: 00007fbde15ff000 > RBX: 000000000000064c RCX: 0000000001abe968 > 2011-02-06T19:47:15.023254+01:00 phy005 kernel: RDX: 0000000001abe850 > RSI: 0000000000000000 RDI: 000000003d600000 > 2011-02-06T19:47:15.023257+01:00 phy005 kernel: RBP: 0000000001f2ab00 > R08: 0000000000000003 R09: 0000000002000000 > 2011-02-06T19:47:15.023260+01:00 phy005 kernel: R10: 000000000000c050 > R11: 00007fbdec000818 R12: 0000000000000025 > 2011-02-06T19:47:15.023269+01:00 phy005 kernel: R13: 0000000000000003 > R14: 000000003d600640 R15: 0000000000000000 > 2011-02-06T19:47:15.023273+01:00 phy005 kernel: FS: > 00007fbdf3c01700(0000) GS:ffff8806554a0000(0000) > knlGS:0000000000000000 > 2011-02-06T19:47:15.023276+01:00 phy005 kernel: CS: Â0010 DS: 002b ES: > 002b CR0: 0000000080050033 > 2011-02-06T19:47:15.023280+01:00 phy005 kernel: CR2: 00007fbde15ff64c > CR3: 0000000606858000 CR4: 00000000000026e0 > 2011-02-06T19:47:15.023283+01:00 phy005 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-02-06T19:47:15.023286+01:00 phy005 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-02-06T19:47:15.023290+01:00 phy005 kernel: Process qemu-kvm (pid: > 3387, threadinfo ffff88060689e000, task ffff8805ff5a9770) > 2011-02-06T19:47:15.023294+01:00 phy005 kernel: > 2011-02-06T19:47:15.023296+01:00 phy005 kernel: RIP > [<00000000004abb73>] 0x4abb73 > 2011-02-06T19:47:15.023298+01:00 phy005 kernel: RSP <00007fbdf3c00680> > 2011-02-06T19:47:15.023300+01:00 phy005 kernel: ---[ end trace > beed2b54d0bb8a01 ]--- > > followed by > > 2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle > kernel paging request at fffff6b192918010 > 2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP: > [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm] > 2011-02-06T21:20:32.883259+01:00 phy005 kernel: PGD 0 > 2011-02-06T21:20:32.883263+01:00 phy005 kernel: Oops: 0000 [#5] SMP > 2011-02-06T21:20:32.883267+01:00 phy005 kernel: last sysfs file: > /sys/devices/system/cpu/cpu15/topology/thread_siblings > 2011-02-06T21:20:32.883271+01:00 phy005 kernel: CPU 8 > 2011-02-06T21:20:32.883278+01:00 phy005 kernel: Modules linked in: tun > ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q > Âgarp stp llc bonding xt_comment xt_recent ip6t_REJECT > nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i > 2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev > ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan] > 2011-02-06T21:20:32.883286+01:00 phy005 kernel: > 2011-02-06T21:20:32.883290+01:00 phy005 kernel: Pid: 13247, comm: > qemu-kvm Tainted: G Â Â ÂD Â Â2.6.34.7-66.tilaa.fc13.x > 86_64 #1 X8DTU/X8DTU > 2011-02-06T21:20:32.883295+01:00 phy005 kernel: RIP: > 0010:[<ffffffffa0078826>] Â[<ffffffffa0078826>] > kvm_mmu_zap_page+0x28a/0x299 [kvm] > 2011-02-06T21:20:32.883300+01:00 phy005 kernel: RSP: > 0018:ffff880312bdfb58 ÂEFLAGS: 00010206 > 2011-02-06T21:20:32.883303+01:00 phy005 kernel: RAX: 00000cb192918000 > RBX: ffff8802d16ae210 RCX: 0000000000000000 > 2011-02-06T21:20:32.883307+01:00 phy005 kernel: RDX: ffffea0000000000 > RSI: ffff88060bb07ff8 RDI: 0000000000000200 > 2011-02-06T21:20:32.883311+01:00 phy005 kernel: RBP: ffff880312bdfb88 > R08: dead000000100100 R09: 0000000000000004 > 2011-02-06T21:20:32.883315+01:00 phy005 kernel: R10: 0000000000000000 > R11: 0000000000000010 R12: ffff880853ae0000 > 2011-02-06T21:20:32.883319+01:00 phy005 kernel: R13: ffff88060bb07ff8 > R14: 00000000000001ff R15: 0000000000000000 > 2011-02-06T21:20:32.883323+01:00 phy005 kernel: FS: > 0000000000000000(0000) GS:ffff880002080000(0000) > knlGS:0000000000000000 > 2011-02-06T21:20:32.883327+01:00 phy005 kernel: CS: Â0010 DS: 002b ES: > 002b CR0: 000000008005003b > 2011-02-06T21:20:32.883331+01:00 phy005 kernel: CR2: fffff6b192918010 > CR3: 0000000001a42000 CR4: 00000000000026e0 > 2011-02-06T21:20:32.883335+01:00 phy005 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-02-06T21:20:32.883338+01:00 phy005 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-02-06T21:20:32.883343+01:00 phy005 kernel: Process qemu-kvm (pid: > 13247, threadinfo ffff880312bde000, task ffff880268ad8000) > 2011-02-06T21:20:32.883347+01:00 phy005 kernel: Stack: > 2011-02-06T21:20:32.883351+01:00 phy005 kernel: 0000000000000002 > ffff880853ae0000 ffff8802d16ae160 ffff880853ae2328 > 2011-02-06T21:20:32.883355+01:00 phy005 kernel: <0> ffff880c22d426e8 > ffff880268ad8000 ffff880312bdfbb8 ffffffffa0078a42 > 2011-02-06T21:20:32.883358+01:00 phy005 kernel: <0> ffffea00134a16c8 > ffff880853ae0000 ffff880853ae0000 0000000000000001 > 2011-02-06T21:20:32.883362+01:00 phy005 kernel: Call Trace: > 2011-02-06T21:20:32.883366+01:00 phy005 kernel: [<ffffffffa0078a42>] > kvm_mmu_zap_all+0x35/0x60 [kvm] > 2011-02-06T21:20:32.883371+01:00 phy005 kernel: [<ffffffffa006dcde>] > kvm_arch_flush_shadow+0x16/0x22 [kvm] > 2011-02-06T21:20:32.883375+01:00 phy005 kernel: [<ffffffffa0063b0a>] > kvm_mmu_notifier_release+0x31/0x44 [kvm] > 2011-02-06T21:20:32.883379+01:00 phy005 kernel: [<ffffffff810fac37>] > __mmu_notifier_release+0x4f/0x7b > 2011-02-06T21:20:32.883383+01:00 phy005 kernel: [<ffffffff810e735d>] > exit_mmap+0x2c/0x132 > 2011-02-06T21:20:32.883386+01:00 phy005 kernel: [<ffffffff8104ad7a>] > mmput+0x5e/0xca > 2011-02-06T21:20:32.883390+01:00 phy005 kernel: [<ffffffff8104f0d5>] > exit_mm+0x114/0x121 > 2011-02-06T21:20:32.883394+01:00 phy005 kernel: [<ffffffff81050bf5>] > do_exit+0x254/0x752 > 2011-02-06T21:20:32.883398+01:00 phy005 kernel: [<ffffffff81051174>] > do_group_exit+0x81/0xab > 2011-02-06T21:20:32.883403+01:00 phy005 kernel: [<ffffffff8105e5cd>] > get_signal_to_deliver+0x3a6/0x3c8 > 2011-02-06T21:20:32.883406+01:00 phy005 kernel: [<ffffffff81009038>] > do_signal+0x72/0x6b8 > 2011-02-06T21:20:32.883410+01:00 phy005 kernel: [<ffffffff8111aa2f>] ? > vfs_ioctl+0x32/0xa6 > 2011-02-06T21:20:32.883413+01:00 phy005 kernel: [<ffffffff8111afa2>] ? > do_vfs_ioctl+0x483/0x4c9 > 2011-02-06T21:20:32.883416+01:00 phy005 kernel: [<ffffffff810096a6>] > do_notify_resume+0x28/0x86 > 2011-02-06T21:20:32.883420+01:00 phy005 kernel: [<ffffffff81009f3e>] > int_signal+0x12/0x17 > 2011-02-06T21:20:32.883426+01:00 phy005 kernel: Code: 41 5e 44 89 f8 > 41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00 > 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3 > ff ff e9 06 fe ff ff 55 48 89 e5 41 57 > 2011-02-06T21:20:32.883431+01:00 phy005 kernel: RIP > [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm] > 2011-02-06T21:20:32.883434+01:00 phy005 kernel: RSP <ffff880312bdfb58> > 2011-02-06T21:20:32.883437+01:00 phy005 kernel: CR2: fffff6b192918010 > 2011-02-06T21:20:32.883441+01:00 phy005 kernel: ---[ end trace > beed2b54d0bb8a04 ]--- > 2011-02-06T21:20:32.883444+01:00 phy005 kernel: Fixing recursive fault > but reboot is needed! > > after which we rebooted the machine and replaced the motherboard and > cpus (we already replaced the memory before). > > But 2 days ago we got this oops: > > 2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle > kernel paging request at ffffea71929181c0 > 2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP: > [<ffffffff81034880>] gup_pte_range+0x94/0xd3 > 2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0 > 2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP > 2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file: > /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m > ap > 2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8 > 2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun > ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q > Âgarp stp llc bonding xt_comment xt_recent ip6t_REJECT > nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i > gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca > serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan] > 2011-02-08T15:56:19.902770+01:00 phy005 kernel: > 2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm: > qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X > 8DTU/X8DTU > 2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP: > 0010:[<ffffffff81034880>] Â[<ffffffff81034880>] gup_pte_range+0x94/ > 0xd3 > 2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP: > 0018:ffff880c21bc1a78 ÂEFLAGS: 00010086 > 2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0 > RBX: 00003ffffffff000 RCX: 0000000000000005 > 2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000 > RSI: 00007fa2ca1ff000 RDI: 1603a07305008067 > 2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98 > R08: ffff88060fdfad60 R09: ffff880c21bc1b44 > 2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8 > R11: ffffea0000000000 R12: 0000000000000205 > 2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff > R14: 0000000000000005 R15: 0000000000000000 > 2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS: > 00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000 > 0000 > 2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS: Â0010 DS: 002b ES: > 002b CR0: 0000000080050033 > 2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0 > CR3: 0000000c231f9000 CR4: 00000000000026e0 > 2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid: > 3346, threadinfo ffff880c21bc0000, task ffff880c2 > 264ddc0) > 2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack: > 2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000 > 00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280 > 2011-02-08T15:56:19.902848+01:00 phy005 kernel: <0> ffff880c21bc1af8 > ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff > 2011-02-08T15:56:19.902852+01:00 phy005 kernel: <0> ffff880c21bc1b44 > ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8 > 2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace: > 2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>] > gup_pud_range+0x156/0x192 > 2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>] > get_user_pages_fast+0xc4/0x172 > 2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ? > bio_add_page+0x36/0x38 > 2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>] > dio_get_page+0x54/0x127 > 2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>] > __blockdev_direct_IO+0x41d/0xa36 > 2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ? > x86_emulate_insn+0x1ff8/0x2d61 [kvm] > 2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>] > blkdev_direct_IO+0x4e/0x50 > 2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ? > blkdev_get_blocks+0x0/0x8d > 2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>] > generic_file_direct_write+0xed/0x16d > 2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>] > __generic_file_aio_write+0x196/0x281 > 2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ? > blkdev_aio_write+0x0/0x69 > 2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ? > blkdev_aio_write+0x0/0x69 > 2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>] > aio_rw_vect_retry+0x85/0x18e > 2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>] > aio_run_iocb+0x77/0x10f > 2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>] > do_io_submit+0x558/0x7ce > 2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>] > sys_io_submit+0x10/0x12 > 2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>] > system_call_fastpath+0x16/0x1b > 2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2 > 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40 > 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79 > 04 48 8b 78 10 f0 ff 47 08 49 63 39 48 > 2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP > [<ffffffff81034880>] gup_pte_range+0x94/0xd3 > 2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP <ffff880c21bc1a78> > 2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0 > 2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace > 174c28940e9fd0a7 ]--- > > and yesterday this one: > > 2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle > kernel NULL pointer dereference at (null) > 2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP: > [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm] > 2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0 > 2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP > 2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file: > /sys/devices/system/cpu/cpu15/topology/thread_siblings > 2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2 > 2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun > ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding > xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter > ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core > iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded: > scsi_wait_scan] > 2011-02-09T07:40:15.636663+01:00 phy005 kernel: > 2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm: > qemu-kvm Tainted: G Â Â ÂD Â Â2.6.34.7-66.tilaa.fc13.x86_64 #1 > X8DTU/X8DTU > 2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP: > 0010:[<ffffffffa0082db8>] Â[<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e > [kvm] > 2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP: > 0018:ffff88061cbcbcd8 ÂEFLAGS: 00010246 > 2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000 > RBX: 1603a07305004fff RCX: ffff88061cbcbd08 > 2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023 > RSI: 1603a07305004fff RDI: 0000000000000000 > 2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8 > R08: 0000000000000023 R09: 0000000000000000 > 2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000 > R11: ffffffffa0082c7f R12: 0000000000000001 > 2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763 > R14: ffff8809b8b01ce0 R15: 0000000000000000 > 2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS: > 0000000000000000(0000) GS:ffff880002040000(0000) > knlGS:0000000000000000 > 2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS: Â0010 DS: 0000 ES: > 0000 CR0: 000000008005003b > 2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000 > CR3: 0000000001a42000 CR4: 00000000000026e0 > 2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > 2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid: > 2572, threadinfo ffff88061cbca000, task ffff88061cf04650) > 2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack: > 2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8 > ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9 > 2011-02-09T07:40:15.636718+01:00 phy005 kernel: <0> ffff8809b8b01ce0 > ffff880c23984000 ffff88036c471ff8 00000000000001ff > 2011-02-09T07:40:15.636721+01:00 phy005 kernel: <0> ffff88061cbcbd58 > ffffffffa008363b 0000000000000200 ffff880c23984000 > 2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace: > 2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>] > rmap_remove+0xa3/0x1a0 [kvm] > 2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>] > kvm_mmu_zap_page+0x9f/0x299 [kvm] > 2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>] > kvm_mmu_zap_all+0x35/0x60 [kvm] > 2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>] > kvm_arch_flush_shadow+0x16/0x22 [kvm] > 2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>] > kvm_mmu_notifier_release+0x31/0x44 [kvm] > 2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>] > __mmu_notifier_release+0x4f/0x7b > 2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>] > exit_mmap+0x2c/0x132 > 2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>] > mmput+0x5e/0xca > 2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>] > exit_mm+0x114/0x121 > 2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>] > do_exit+0x254/0x752 > 2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ? > apic_timer_interrupt+0xe/0x20 > 2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>] > do_group_exit+0x81/0xab > 2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>] > sys_exit_group+0x17/0x1b > 2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>] > system_call_fastpath+0x16/0x1b > 2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8 > 01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89 > f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d <48> 2b 18 48 c1 e3 03 48 > 03 58 18 eb 39 41 8d 4c 24 ff be 01 00 > 2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP > [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm] > 2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP <ffff88061cbcbcd8> > 2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000 > 2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace > 174c28940e9fd0a9 ]--- > 2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault > but reboot is needed! > > So it doesn't seem to be a hardware problem since we replaced all that. > > Kind regards, > > Ruben And tonight we had another one of those errors we had a few weeks ago: 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration. 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000 2011-02-13T02:56:28.694914+01:00 phy005 kernel: ept_misconfig_inspect_spte: spte 0x25602d007 level 4 2011-02-13T02:56:28.694916+01:00 phy005 kernel: ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3 2011-02-13T02:56:28.694919+01:00 phy005 kernel: ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2 2011-02-13T02:56:28.694925+01:00 phy005 kernel: ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1 2011-02-13T02:56:28.694928+01:00 phy005 kernel: ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000 2011-02-13T02:56:28.694930+01:00 phy005 kernel: ------------[ cut here ]------------ 2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]() 2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU 2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm: qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace: 2011-02-13T02:56:28.695013+01:00 phy005 kernel: [<ffffffff8104d11f>] warn_slowpath_common+0x7c/0x94 2011-02-13T02:56:28.695020+01:00 phy005 kernel: [<ffffffff8104d14b>] warn_slowpath_null+0x14/0x16 2011-02-13T02:56:28.695024+01:00 phy005 kernel: [<ffffffffa00c97fb>] handle_ept_misconfig+0x152/0x1d8 [kvm_intel] 2011-02-13T02:56:28.695028+01:00 phy005 kernel: [<ffffffffa00ca401>] vmx_handle_exit+0x204/0x23a [kvm_intel] 2011-02-13T02:56:28.695033+01:00 phy005 kernel: [<ffffffffa0084998>] kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm] 2011-02-13T02:56:28.695037+01:00 phy005 kernel: [<ffffffffa00735ba>] kvm_vcpu_ioctl+0xfd/0x56e [kvm] 2011-02-13T02:56:28.695042+01:00 phy005 kernel: [<ffffffff810feaab>] ? virt_to_head_page+0xe/0x2f 2011-02-13T02:56:28.695046+01:00 phy005 kernel: [<ffffffff810cc6ca>] ? mempool_kfree+0xe/0x10 2011-02-13T02:56:28.695051+01:00 phy005 kernel: [<ffffffff810cc857>] ? mempool_free+0x76/0x7b 2011-02-13T02:56:28.695055+01:00 phy005 kernel: [<ffffffff8111aa2f>] vfs_ioctl+0x32/0xa6 2011-02-13T02:56:28.695060+01:00 phy005 kernel: [<ffffffff8111afa2>] do_vfs_ioctl+0x483/0x4c9 2011-02-13T02:56:28.695065+01:00 phy005 kernel: [<ffffffff8111b03e>] sys_ioctl+0x56/0x79 2011-02-13T02:56:28.695070+01:00 phy005 kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b 2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace d95032626ea304ca ]--- Any help would be much appreciated. It seems very strange that I'm the first one who runs into this. I've found two bugreports which report the same, the first one at https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=613691, but that's a duplicate of https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm not authorized to see... Kind regards, Ruben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html