On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote: > On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@xxxxxxxxxx> wrote: >> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote: >>> >>> > ÂWhen you say "suddenly", this was with no changes to software and >>> > hardware? >>> >>> The host software and hardware hasn't changed in the two months since >>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13. >>> >>> We host customer vms on it though, so virtual machines come and go. >>> Various operating systems, a mixture of Linux, FreeBSD and Windows >>> 2008 R2. We have other machines with the same config without these >>> problems though. >> >> Are those other machines running a similar workload? > > Yes, similar, or they're more heavily loaded. > > On this machine, about half of the 48GB memory was used for virtual machines. > >> The traces look awfully like bad hardware, though that can also be explained >> by random memory corruption due to a bug. > > Yeah, that's what I'm expecting. We already replaced the memory, next > step is to move the disks over to another server to make sure it's not > the board or cpu's. > >>> This time I have a few different messages though: >>> >>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault: >>> 0000 [#1] SMP >>> >>> RSI: 0000000000000000 RDI: 1603a07305001568 >>> >>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46 >>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d >>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0> Âff 4f 08 0f 94 c0 84 >>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb >> >> lock decl 0x8(%rdi) >> >> %rdi is completely crap, looks like corruption again. ÂStrangely, it is >> similar to the bad spte from the previous trace: 0x1603a0730500d277. ÂThe >> upper 48 bits are identical, the lower 16 bits are different.: >>> >>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted >>> page table at address 7f37b37ff000 >>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD >>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067 >> >> Here are those magic 48 bits again, in the PTE entry. >>> >>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration. >>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038 >>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel: >>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4 >>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel: >>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3 >>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel: >>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2 >>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel: >>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1 >> >> Again. >> >>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in >>> process qemu-kvm Âpte:1603a0730500d067 pmd:61059f067 >> >> Again. >> >> However, these all came from a single boot, yes? > > Correct. > >> If so they can be the same >> corruption. ÂPlease collect more traces, with reboots in between. This machine has been running for a week without problems, but then we started to get the following oopses again: 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle kernel paging request at ffffea71929180e0 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP: [<ffffffff81034880>] gup_pte_range+0x94/0xd3 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0 2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP 2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file: /sys/devices/system/cpu/cpu15/topology/thread_siblings 2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4 2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-06T19:45:35.222231+01:00 phy005 kernel: 2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm: qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU 2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP: 0010:[<ffffffff81034880>] [<ffffffff81034880>] gup_pte_range+0x94/0xd3 2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP: 0018:ffff88060b9bda78 EFLAGS: 00010082 2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0 RBX: 00003ffffffff000 RCX: 0000000000000005 2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000 RSI: 00007fe54e3ff000 RDI: 1603a07305004067 2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98 R08: ffff880b94384560 R09: ffff88060b9bdb44 2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8 R11: ffffea0000000000 R12: 0000000000000205 2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff R14: 0000000000000005 R15: 0000000000000000 2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS: 00007fe64cb0e700(0000) GS:ffff880655400000(0000) knlGS:0000000000000000 2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0 CR3: 0000000bff06d000 CR4: 00000000000026e0 2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid: 3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0) 2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack: 2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000 00007fe54e400000 00007fe54e400000 ffff88053a0d2388 2011-02-06T19:45:35.222285+01:00 phy005 kernel: <0> ffff88060b9bdaf8 ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff 2011-02-06T19:45:35.222289+01:00 phy005 kernel: <0> ffff88060b9bdb44 ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8 2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace: 2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>] gup_pud_range+0x156/0x192 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>] get_user_pages_fast+0xc4/0x172 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ? bio_add_page+0x36/0x38 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>] dio_get_page+0x54/0x127 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>] __blockdev_direct_IO+0x41d/0xa36 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ? x86_emulate_insn+0x1ff8/0x2d61 [kvm] 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>] blkdev_direct_IO+0x4e/0x50 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ? blkdev_get_blocks+0x0/0x8d 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>] generic_file_direct_write+0xed/0x16d 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>] __generic_file_aio_write+0x196/0x281 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ? file_has_perm+0xa4/0xc6 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ? blkdev_aio_write+0x0/0x69 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>] blkdev_aio_write+0x2a/0x69 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ? blkdev_aio_write+0x0/0x69 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>] aio_rw_vect_retry+0x85/0x18e 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>] aio_run_iocb+0x77/0x10f 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>] do_io_submit+0x558/0x7ce 2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>] sys_io_submit+0x10/0x12 2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b 2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79 04 48 8b 78 10 f0 ff 47 08 49 63 39 48 2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP [<ffffffff81034880>] gup_pte_range+0x94/0xd3 2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP <ffff88060b9bda78> 2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0 2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace beed2b54d0bb8a00 ]--- and 2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted page table at address 7fbde15ff64c 2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD 612668067 PMD 5937b7067 PTE 1603a07305008067 2011-02-06T19:47:15.023214+01:00 phy005 kernel: Bad pagetable: 000d [#2] SMP 2011-02-06T19:47:15.023219+01:00 phy005 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host0/scsi_host/host0/stats 2011-02-06T19:47:15.023226+01:00 phy005 kernel: CPU 13 2011-02-06T19:47:15.023232+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-06T19:47:15.023236+01:00 phy005 kernel: 2011-02-06T19:47:15.023239+01:00 phy005 kernel: Pid: 3387, comm: qemu-kvm Tainted: G D 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU 2011-02-06T19:47:15.023244+01:00 phy005 kernel: RIP: 0033:[<00000000004abb73>] [<00000000004abb73>] 0x4abb73 2011-02-06T19:47:15.023247+01:00 phy005 kernel: RSP: 002b:00007fbdf3c00680 EFLAGS: 00010206 2011-02-06T19:47:15.023251+01:00 phy005 kernel: RAX: 00007fbde15ff000 RBX: 000000000000064c RCX: 0000000001abe968 2011-02-06T19:47:15.023254+01:00 phy005 kernel: RDX: 0000000001abe850 RSI: 0000000000000000 RDI: 000000003d600000 2011-02-06T19:47:15.023257+01:00 phy005 kernel: RBP: 0000000001f2ab00 R08: 0000000000000003 R09: 0000000002000000 2011-02-06T19:47:15.023260+01:00 phy005 kernel: R10: 000000000000c050 R11: 00007fbdec000818 R12: 0000000000000025 2011-02-06T19:47:15.023269+01:00 phy005 kernel: R13: 0000000000000003 R14: 000000003d600640 R15: 0000000000000000 2011-02-06T19:47:15.023273+01:00 phy005 kernel: FS: 00007fbdf3c01700(0000) GS:ffff8806554a0000(0000) knlGS:0000000000000000 2011-02-06T19:47:15.023276+01:00 phy005 kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 2011-02-06T19:47:15.023280+01:00 phy005 kernel: CR2: 00007fbde15ff64c CR3: 0000000606858000 CR4: 00000000000026e0 2011-02-06T19:47:15.023283+01:00 phy005 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2011-02-06T19:47:15.023286+01:00 phy005 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2011-02-06T19:47:15.023290+01:00 phy005 kernel: Process qemu-kvm (pid: 3387, threadinfo ffff88060689e000, task ffff8805ff5a9770) 2011-02-06T19:47:15.023294+01:00 phy005 kernel: 2011-02-06T19:47:15.023296+01:00 phy005 kernel: RIP [<00000000004abb73>] 0x4abb73 2011-02-06T19:47:15.023298+01:00 phy005 kernel: RSP <00007fbdf3c00680> 2011-02-06T19:47:15.023300+01:00 phy005 kernel: ---[ end trace beed2b54d0bb8a01 ]--- followed by 2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle kernel paging request at fffff6b192918010 2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP: [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm] 2011-02-06T21:20:32.883259+01:00 phy005 kernel: PGD 0 2011-02-06T21:20:32.883263+01:00 phy005 kernel: Oops: 0000 [#5] SMP 2011-02-06T21:20:32.883267+01:00 phy005 kernel: last sysfs file: /sys/devices/system/cpu/cpu15/topology/thread_siblings 2011-02-06T21:20:32.883271+01:00 phy005 kernel: CPU 8 2011-02-06T21:20:32.883278+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i 2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-06T21:20:32.883286+01:00 phy005 kernel: 2011-02-06T21:20:32.883290+01:00 phy005 kernel: Pid: 13247, comm: qemu-kvm Tainted: G D 2.6.34.7-66.tilaa.fc13.x 86_64 #1 X8DTU/X8DTU 2011-02-06T21:20:32.883295+01:00 phy005 kernel: RIP: 0010:[<ffffffffa0078826>] [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm] 2011-02-06T21:20:32.883300+01:00 phy005 kernel: RSP: 0018:ffff880312bdfb58 EFLAGS: 00010206 2011-02-06T21:20:32.883303+01:00 phy005 kernel: RAX: 00000cb192918000 RBX: ffff8802d16ae210 RCX: 0000000000000000 2011-02-06T21:20:32.883307+01:00 phy005 kernel: RDX: ffffea0000000000 RSI: ffff88060bb07ff8 RDI: 0000000000000200 2011-02-06T21:20:32.883311+01:00 phy005 kernel: RBP: ffff880312bdfb88 R08: dead000000100100 R09: 0000000000000004 2011-02-06T21:20:32.883315+01:00 phy005 kernel: R10: 0000000000000000 R11: 0000000000000010 R12: ffff880853ae0000 2011-02-06T21:20:32.883319+01:00 phy005 kernel: R13: ffff88060bb07ff8 R14: 00000000000001ff R15: 0000000000000000 2011-02-06T21:20:32.883323+01:00 phy005 kernel: FS: 0000000000000000(0000) GS:ffff880002080000(0000) knlGS:0000000000000000 2011-02-06T21:20:32.883327+01:00 phy005 kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b 2011-02-06T21:20:32.883331+01:00 phy005 kernel: CR2: fffff6b192918010 CR3: 0000000001a42000 CR4: 00000000000026e0 2011-02-06T21:20:32.883335+01:00 phy005 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2011-02-06T21:20:32.883338+01:00 phy005 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2011-02-06T21:20:32.883343+01:00 phy005 kernel: Process qemu-kvm (pid: 13247, threadinfo ffff880312bde000, task ffff880268ad8000) 2011-02-06T21:20:32.883347+01:00 phy005 kernel: Stack: 2011-02-06T21:20:32.883351+01:00 phy005 kernel: 0000000000000002 ffff880853ae0000 ffff8802d16ae160 ffff880853ae2328 2011-02-06T21:20:32.883355+01:00 phy005 kernel: <0> ffff880c22d426e8 ffff880268ad8000 ffff880312bdfbb8 ffffffffa0078a42 2011-02-06T21:20:32.883358+01:00 phy005 kernel: <0> ffffea00134a16c8 ffff880853ae0000 ffff880853ae0000 0000000000000001 2011-02-06T21:20:32.883362+01:00 phy005 kernel: Call Trace: 2011-02-06T21:20:32.883366+01:00 phy005 kernel: [<ffffffffa0078a42>] kvm_mmu_zap_all+0x35/0x60 [kvm] 2011-02-06T21:20:32.883371+01:00 phy005 kernel: [<ffffffffa006dcde>] kvm_arch_flush_shadow+0x16/0x22 [kvm] 2011-02-06T21:20:32.883375+01:00 phy005 kernel: [<ffffffffa0063b0a>] kvm_mmu_notifier_release+0x31/0x44 [kvm] 2011-02-06T21:20:32.883379+01:00 phy005 kernel: [<ffffffff810fac37>] __mmu_notifier_release+0x4f/0x7b 2011-02-06T21:20:32.883383+01:00 phy005 kernel: [<ffffffff810e735d>] exit_mmap+0x2c/0x132 2011-02-06T21:20:32.883386+01:00 phy005 kernel: [<ffffffff8104ad7a>] mmput+0x5e/0xca 2011-02-06T21:20:32.883390+01:00 phy005 kernel: [<ffffffff8104f0d5>] exit_mm+0x114/0x121 2011-02-06T21:20:32.883394+01:00 phy005 kernel: [<ffffffff81050bf5>] do_exit+0x254/0x752 2011-02-06T21:20:32.883398+01:00 phy005 kernel: [<ffffffff81051174>] do_group_exit+0x81/0xab 2011-02-06T21:20:32.883403+01:00 phy005 kernel: [<ffffffff8105e5cd>] get_signal_to_deliver+0x3a6/0x3c8 2011-02-06T21:20:32.883406+01:00 phy005 kernel: [<ffffffff81009038>] do_signal+0x72/0x6b8 2011-02-06T21:20:32.883410+01:00 phy005 kernel: [<ffffffff8111aa2f>] ? vfs_ioctl+0x32/0xa6 2011-02-06T21:20:32.883413+01:00 phy005 kernel: [<ffffffff8111afa2>] ? do_vfs_ioctl+0x483/0x4c9 2011-02-06T21:20:32.883416+01:00 phy005 kernel: [<ffffffff810096a6>] do_notify_resume+0x28/0x86 2011-02-06T21:20:32.883420+01:00 phy005 kernel: [<ffffffff81009f3e>] int_signal+0x12/0x17 2011-02-06T21:20:32.883426+01:00 phy005 kernel: Code: 41 5e 44 89 f8 41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3 ff ff e9 06 fe ff ff 55 48 89 e5 41 57 2011-02-06T21:20:32.883431+01:00 phy005 kernel: RIP [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm] 2011-02-06T21:20:32.883434+01:00 phy005 kernel: RSP <ffff880312bdfb58> 2011-02-06T21:20:32.883437+01:00 phy005 kernel: CR2: fffff6b192918010 2011-02-06T21:20:32.883441+01:00 phy005 kernel: ---[ end trace beed2b54d0bb8a04 ]--- 2011-02-06T21:20:32.883444+01:00 phy005 kernel: Fixing recursive fault but reboot is needed! after which we rebooted the machine and replaced the motherboard and cpus (we already replaced the memory before). But 2 days ago we got this oops: 2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle kernel paging request at ffffea71929181c0 2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP: [<ffffffff81034880>] gup_pte_range+0x94/0xd3 2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0 2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP 2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m ap 2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8 2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-08T15:56:19.902770+01:00 phy005 kernel: 2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm: qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X 8DTU/X8DTU 2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP: 0010:[<ffffffff81034880>] [<ffffffff81034880>] gup_pte_range+0x94/ 0xd3 2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP: 0018:ffff880c21bc1a78 EFLAGS: 00010086 2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0 RBX: 00003ffffffff000 RCX: 0000000000000005 2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000 RSI: 00007fa2ca1ff000 RDI: 1603a07305008067 2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98 R08: ffff88060fdfad60 R09: ffff880c21bc1b44 2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8 R11: ffffea0000000000 R12: 0000000000000205 2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff R14: 0000000000000005 R15: 0000000000000000 2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS: 00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000 0000 2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0 CR3: 0000000c231f9000 CR4: 00000000000026e0 2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid: 3346, threadinfo ffff880c21bc0000, task ffff880c2 264ddc0) 2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack: 2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000 00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280 2011-02-08T15:56:19.902848+01:00 phy005 kernel: <0> ffff880c21bc1af8 ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff 2011-02-08T15:56:19.902852+01:00 phy005 kernel: <0> ffff880c21bc1b44 ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8 2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace: 2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>] gup_pud_range+0x156/0x192 2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>] get_user_pages_fast+0xc4/0x172 2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ? bio_add_page+0x36/0x38 2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>] dio_get_page+0x54/0x127 2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>] __blockdev_direct_IO+0x41d/0xa36 2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ? x86_emulate_insn+0x1ff8/0x2d61 [kvm] 2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>] blkdev_direct_IO+0x4e/0x50 2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ? blkdev_get_blocks+0x0/0x8d 2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>] generic_file_direct_write+0xed/0x16d 2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>] __generic_file_aio_write+0x196/0x281 2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ? blkdev_aio_write+0x0/0x69 2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ? blkdev_aio_write+0x0/0x69 2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>] aio_rw_vect_retry+0x85/0x18e 2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>] aio_run_iocb+0x77/0x10f 2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>] do_io_submit+0x558/0x7ce 2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>] sys_io_submit+0x10/0x12 2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b 2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79 04 48 8b 78 10 f0 ff 47 08 49 63 39 48 2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP [<ffffffff81034880>] gup_pte_range+0x94/0xd3 2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP <ffff880c21bc1a78> 2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0 2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace 174c28940e9fd0a7 ]--- and yesterday this one: 2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle kernel NULL pointer dereference at (null) 2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP: [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm] 2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0 2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP 2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file: /sys/devices/system/cpu/cpu15/topology/thread_siblings 2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2 2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan] 2011-02-09T07:40:15.636663+01:00 phy005 kernel: 2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm: qemu-kvm Tainted: G D 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU 2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP: 0010:[<ffffffffa0082db8>] [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm] 2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP: 0018:ffff88061cbcbcd8 EFLAGS: 00010246 2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000 RBX: 1603a07305004fff RCX: ffff88061cbcbd08 2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023 RSI: 1603a07305004fff RDI: 0000000000000000 2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8 R08: 0000000000000023 R09: 0000000000000000 2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000 R11: ffffffffa0082c7f R12: 0000000000000001 2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763 R14: ffff8809b8b01ce0 R15: 0000000000000000 2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS: 0000000000000000(0000) GS:ffff880002040000(0000) knlGS:0000000000000000 2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000 CR3: 0000000001a42000 CR4: 00000000000026e0 2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid: 2572, threadinfo ffff88061cbca000, task ffff88061cf04650) 2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack: 2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8 ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9 2011-02-09T07:40:15.636718+01:00 phy005 kernel: <0> ffff8809b8b01ce0 ffff880c23984000 ffff88036c471ff8 00000000000001ff 2011-02-09T07:40:15.636721+01:00 phy005 kernel: <0> ffff88061cbcbd58 ffffffffa008363b 0000000000000200 ffff880c23984000 2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace: 2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>] rmap_remove+0xa3/0x1a0 [kvm] 2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>] kvm_mmu_zap_page+0x9f/0x299 [kvm] 2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>] kvm_mmu_zap_all+0x35/0x60 [kvm] 2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>] kvm_arch_flush_shadow+0x16/0x22 [kvm] 2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>] kvm_mmu_notifier_release+0x31/0x44 [kvm] 2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>] __mmu_notifier_release+0x4f/0x7b 2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>] exit_mmap+0x2c/0x132 2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>] mmput+0x5e/0xca 2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>] exit_mm+0x114/0x121 2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>] do_exit+0x254/0x752 2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ? apic_timer_interrupt+0xe/0x20 2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>] do_group_exit+0x81/0xab 2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>] sys_exit_group+0x17/0x1b 2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b 2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8 01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89 f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d <48> 2b 18 48 c1 e3 03 48 03 58 18 eb 39 41 8d 4c 24 ff be 01 00 2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm] 2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP <ffff88061cbcbcd8> 2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000 2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace 174c28940e9fd0a9 ]--- 2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault but reboot is needed! So it doesn't seem to be a hardware problem since we replaced all that. Kind regards, Ruben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html