Re: EPT: Misconfiguration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote:
> On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@xxxxxxxxxx> wrote:
>> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
>>>
>>> > ÂWhen you say "suddenly", this was with no changes to software and
>>> > hardware?
>>>
>>> The host software and hardware hasn't changed in the two months since
>>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>>>
>>> We host customer vms on it though, so virtual machines come and go.
>>> Various operating systems, a mixture of Linux, FreeBSD and Windows
>>> 2008 R2. We have other machines with the same config without these
>>> problems though.
>>
>> Are those other machines running a similar workload?
>
> Yes, similar, or they're more heavily loaded.
>
> On this machine, about half of the 48GB memory was used for virtual machines.
>
>> The traces look awfully like bad hardware, though that can also be explained
>> by random memory corruption due to a bug.
>
> Yeah, that's what I'm expecting. We already replaced the memory, next
> step is to move the disks over to another server to make sure it's not
> the board or cpu's.
>
>>> This time I have a few different messages though:
>>>
>>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
>>> 0000 [#1] SMP
>>>
>>> RSI: 0000000000000000 RDI: 1603a07305001568
>>>
>>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
>>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
>>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0> Âff 4f 08 0f 94 c0 84
>>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
>>
>> lock decl 0x8(%rdi)
>>
>> %rdi is completely crap, looks like corruption again. ÂStrangely, it is
>> similar to the bad spte from the previous trace: 0x1603a0730500d277. ÂThe
>> upper 48 bits are identical, the lower 16 bits are different.:
>>>
>>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
>>> page table at address 7f37b37ff000
>>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
>>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067
>>
>> Here are those magic 48 bits again, in the PTE entry.
>>>
>>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
>>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
>>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
>>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
>>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
>>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
>>
>> Again.
>>
>>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
>>> process qemu-kvm Âpte:1603a0730500d067 pmd:61059f067
>>
>> Again.
>>
>> However, these all came from a single boot, yes?
>
> Correct.
>
>> If so they can be the same
>> corruption. ÂPlease collect more traces, with reboots in between.

This machine has been running for a week without problems, but then we
started to get the following oopses again:

2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ffffea71929180e0
2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP
2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4
2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:45:35.222231+01:00 phy005 kernel:
2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP:
0010:[<ffffffff81034880>]  [<ffffffff81034880>]
gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP:
0018:ffff88060b9bda78  EFLAGS: 00010082
2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0
RBX: 00003ffffffff000 RCX: 0000000000000005
2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000
RSI: 00007fe54e3ff000 RDI: 1603a07305004067
2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98
R08: ffff880b94384560 R09: ffff88060b9bdb44
2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8
R11: ffffea0000000000 R12: 0000000000000205
2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff
R14: 0000000000000005 R15: 0000000000000000
2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS:
00007fe64cb0e700(0000) GS:ffff880655400000(0000)
knlGS:0000000000000000
2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0
CR3: 0000000bff06d000 CR4: 00000000000026e0
2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid:
3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0)
2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack:
2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000
00007fe54e400000 00007fe54e400000 ffff88053a0d2388
2011-02-06T19:45:35.222285+01:00 phy005 kernel: <0> ffff88060b9bdaf8
ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff
2011-02-06T19:45:35.222289+01:00 phy005 kernel: <0> ffff88060b9bdb44
ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8
2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace:
2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>]
gup_pud_range+0x156/0x192
2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>]
get_user_pages_fast+0xc4/0x172
2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
bio_add_page+0x36/0x38
2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>]
dio_get_page+0x54/0x127
2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>]
__blockdev_direct_IO+0x41d/0xa36
2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ?
x86_emulate_insn+0x1ff8/0x2d61 [kvm]
2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>]
blkdev_direct_IO+0x4e/0x50
2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ?
blkdev_get_blocks+0x0/0x8d
2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>]
generic_file_direct_write+0xed/0x16d
2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>]
__generic_file_aio_write+0x196/0x281
2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ?
file_has_perm+0xa4/0xc6
2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>]
blkdev_aio_write+0x2a/0x69
2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>]
aio_rw_vect_retry+0x85/0x18e
2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>]
aio_run_iocb+0x77/0x10f
2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>]
do_io_submit+0x558/0x7ce
2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>]
sys_io_submit+0x10/0x12
2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2
49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
04 48 8b 78 10 f0 ff 47 08 49 63 39 48
2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP <ffff88060b9bda78>
2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0
2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a00 ]---

and

2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted
page table at address 7fbde15ff64c
2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD
612668067 PMD 5937b7067 PTE 1603a07305008067
2011-02-06T19:47:15.023214+01:00 phy005 kernel: Bad pagetable: 000d [#2] SMP
2011-02-06T19:47:15.023219+01:00 phy005 kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host0/scsi_host/host0/stats
2011-02-06T19:47:15.023226+01:00 phy005 kernel: CPU 13
2011-02-06T19:47:15.023232+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:47:15.023236+01:00 phy005 kernel:
2011-02-06T19:47:15.023239+01:00 phy005 kernel: Pid: 3387, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-02-06T19:47:15.023244+01:00 phy005 kernel: RIP:
0033:[<00000000004abb73>]  [<00000000004abb73>] 0x4abb73
2011-02-06T19:47:15.023247+01:00 phy005 kernel: RSP:
002b:00007fbdf3c00680  EFLAGS: 00010206
2011-02-06T19:47:15.023251+01:00 phy005 kernel: RAX: 00007fbde15ff000
RBX: 000000000000064c RCX: 0000000001abe968
2011-02-06T19:47:15.023254+01:00 phy005 kernel: RDX: 0000000001abe850
RSI: 0000000000000000 RDI: 000000003d600000
2011-02-06T19:47:15.023257+01:00 phy005 kernel: RBP: 0000000001f2ab00
R08: 0000000000000003 R09: 0000000002000000
2011-02-06T19:47:15.023260+01:00 phy005 kernel: R10: 000000000000c050
R11: 00007fbdec000818 R12: 0000000000000025
2011-02-06T19:47:15.023269+01:00 phy005 kernel: R13: 0000000000000003
R14: 000000003d600640 R15: 0000000000000000
2011-02-06T19:47:15.023273+01:00 phy005 kernel: FS:
00007fbdf3c01700(0000) GS:ffff8806554a0000(0000)
knlGS:0000000000000000
2011-02-06T19:47:15.023276+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-06T19:47:15.023280+01:00 phy005 kernel: CR2: 00007fbde15ff64c
CR3: 0000000606858000 CR4: 00000000000026e0
2011-02-06T19:47:15.023283+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T19:47:15.023286+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T19:47:15.023290+01:00 phy005 kernel: Process qemu-kvm (pid:
3387, threadinfo ffff88060689e000, task ffff8805ff5a9770)
2011-02-06T19:47:15.023294+01:00 phy005 kernel:
2011-02-06T19:47:15.023296+01:00 phy005 kernel: RIP
[<00000000004abb73>] 0x4abb73
2011-02-06T19:47:15.023298+01:00 phy005 kernel: RSP <00007fbdf3c00680>
2011-02-06T19:47:15.023300+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a01 ]---

followed by

2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at fffff6b192918010
2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP:
[<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883259+01:00 phy005 kernel: PGD 0
2011-02-06T21:20:32.883263+01:00 phy005 kernel: Oops: 0000 [#5] SMP
2011-02-06T21:20:32.883267+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T21:20:32.883271+01:00 phy005 kernel: CPU 8
2011-02-06T21:20:32.883278+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
 garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev
ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan]
2011-02-06T21:20:32.883286+01:00 phy005 kernel:
2011-02-06T21:20:32.883290+01:00 phy005 kernel: Pid: 13247, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x
86_64 #1 X8DTU/X8DTU
2011-02-06T21:20:32.883295+01:00 phy005 kernel: RIP:
0010:[<ffffffffa0078826>]  [<ffffffffa0078826>]
kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883300+01:00 phy005 kernel: RSP:
0018:ffff880312bdfb58  EFLAGS: 00010206
2011-02-06T21:20:32.883303+01:00 phy005 kernel: RAX: 00000cb192918000
RBX: ffff8802d16ae210 RCX: 0000000000000000
2011-02-06T21:20:32.883307+01:00 phy005 kernel: RDX: ffffea0000000000
RSI: ffff88060bb07ff8 RDI: 0000000000000200
2011-02-06T21:20:32.883311+01:00 phy005 kernel: RBP: ffff880312bdfb88
R08: dead000000100100 R09: 0000000000000004
2011-02-06T21:20:32.883315+01:00 phy005 kernel: R10: 0000000000000000
R11: 0000000000000010 R12: ffff880853ae0000
2011-02-06T21:20:32.883319+01:00 phy005 kernel: R13: ffff88060bb07ff8
R14: 00000000000001ff R15: 0000000000000000
2011-02-06T21:20:32.883323+01:00 phy005 kernel: FS:
0000000000000000(0000) GS:ffff880002080000(0000)
knlGS:0000000000000000
2011-02-06T21:20:32.883327+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 000000008005003b
2011-02-06T21:20:32.883331+01:00 phy005 kernel: CR2: fffff6b192918010
CR3: 0000000001a42000 CR4: 00000000000026e0
2011-02-06T21:20:32.883335+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T21:20:32.883338+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T21:20:32.883343+01:00 phy005 kernel: Process qemu-kvm (pid:
13247, threadinfo ffff880312bde000, task ffff880268ad8000)
2011-02-06T21:20:32.883347+01:00 phy005 kernel: Stack:
2011-02-06T21:20:32.883351+01:00 phy005 kernel: 0000000000000002
ffff880853ae0000 ffff8802d16ae160 ffff880853ae2328
2011-02-06T21:20:32.883355+01:00 phy005 kernel: <0> ffff880c22d426e8
ffff880268ad8000 ffff880312bdfbb8 ffffffffa0078a42
2011-02-06T21:20:32.883358+01:00 phy005 kernel: <0> ffffea00134a16c8
ffff880853ae0000 ffff880853ae0000 0000000000000001
2011-02-06T21:20:32.883362+01:00 phy005 kernel: Call Trace:
2011-02-06T21:20:32.883366+01:00 phy005 kernel: [<ffffffffa0078a42>]
kvm_mmu_zap_all+0x35/0x60 [kvm]
2011-02-06T21:20:32.883371+01:00 phy005 kernel: [<ffffffffa006dcde>]
kvm_arch_flush_shadow+0x16/0x22 [kvm]
2011-02-06T21:20:32.883375+01:00 phy005 kernel: [<ffffffffa0063b0a>]
kvm_mmu_notifier_release+0x31/0x44 [kvm]
2011-02-06T21:20:32.883379+01:00 phy005 kernel: [<ffffffff810fac37>]
__mmu_notifier_release+0x4f/0x7b
2011-02-06T21:20:32.883383+01:00 phy005 kernel: [<ffffffff810e735d>]
exit_mmap+0x2c/0x132
2011-02-06T21:20:32.883386+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-02-06T21:20:32.883390+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-02-06T21:20:32.883394+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-02-06T21:20:32.883398+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-02-06T21:20:32.883403+01:00 phy005 kernel: [<ffffffff8105e5cd>]
get_signal_to_deliver+0x3a6/0x3c8
2011-02-06T21:20:32.883406+01:00 phy005 kernel: [<ffffffff81009038>]
do_signal+0x72/0x6b8
2011-02-06T21:20:32.883410+01:00 phy005 kernel: [<ffffffff8111aa2f>] ?
vfs_ioctl+0x32/0xa6
2011-02-06T21:20:32.883413+01:00 phy005 kernel: [<ffffffff8111afa2>] ?
do_vfs_ioctl+0x483/0x4c9
2011-02-06T21:20:32.883416+01:00 phy005 kernel: [<ffffffff810096a6>]
do_notify_resume+0x28/0x86
2011-02-06T21:20:32.883420+01:00 phy005 kernel: [<ffffffff81009f3e>]
int_signal+0x12/0x17
2011-02-06T21:20:32.883426+01:00 phy005 kernel: Code: 41 5e 44 89 f8
41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00
00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3
ff ff e9 06 fe ff ff 55 48 89 e5 41 57
2011-02-06T21:20:32.883431+01:00 phy005 kernel: RIP
[<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883434+01:00 phy005 kernel: RSP <ffff880312bdfb58>
2011-02-06T21:20:32.883437+01:00 phy005 kernel: CR2: fffff6b192918010
2011-02-06T21:20:32.883441+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a04 ]---
2011-02-06T21:20:32.883444+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

after which we rebooted the machine and replaced the motherboard and
cpus (we already replaced the memory before).

But 2 days ago we got this oops:

2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ffffea71929181c0
2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP:
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP
2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
ap
2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8
2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
 garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
2011-02-08T15:56:19.902770+01:00 phy005 kernel:
2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X
8DTU/X8DTU
2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP:
0010:[<ffffffff81034880>]  [<ffffffff81034880>] gup_pte_range+0x94/
0xd3
2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP:
0018:ffff880c21bc1a78  EFLAGS: 00010086
2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0
RBX: 00003ffffffff000 RCX: 0000000000000005
2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000
RSI: 00007fa2ca1ff000 RDI: 1603a07305008067
2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98
R08: ffff88060fdfad60 R09: ffff880c21bc1b44
2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8
R11: ffffea0000000000 R12: 0000000000000205
2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff
R14: 0000000000000005 R15: 0000000000000000
2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS:
00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000
0000
2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0
CR3: 0000000c231f9000 CR4: 00000000000026e0
2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid:
3346, threadinfo ffff880c21bc0000, task ffff880c2
264ddc0)
2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack:
2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000
00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280
2011-02-08T15:56:19.902848+01:00 phy005 kernel: <0> ffff880c21bc1af8
ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff
2011-02-08T15:56:19.902852+01:00 phy005 kernel: <0> ffff880c21bc1b44
ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8
2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace:
2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>]
gup_pud_range+0x156/0x192
2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>]
get_user_pages_fast+0xc4/0x172
2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
bio_add_page+0x36/0x38
2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>]
dio_get_page+0x54/0x127
2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>]
__blockdev_direct_IO+0x41d/0xa36
2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ?
x86_emulate_insn+0x1ff8/0x2d61 [kvm]
2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>]
blkdev_direct_IO+0x4e/0x50
2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ?
blkdev_get_blocks+0x0/0x8d
2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>]
generic_file_direct_write+0xed/0x16d
2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>]
__generic_file_aio_write+0x196/0x281
2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>]
aio_rw_vect_retry+0x85/0x18e
2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>]
aio_run_iocb+0x77/0x10f
2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>]
do_io_submit+0x558/0x7ce
2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>]
sys_io_submit+0x10/0x12
2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2
49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
04 48 8b 78 10 f0 ff 47 08 49 63 39 48
2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP <ffff880c21bc1a78>
2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0
2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace
174c28940e9fd0a7 ]---

and yesterday this one:

2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle
kernel NULL pointer dereference at (null)
2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP:
[<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0
2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP
2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2
2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core
iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-09T07:40:15.636663+01:00 phy005 kernel:
2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP:
0010:[<ffffffffa0082db8>]  [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e
[kvm]
2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP:
0018:ffff88061cbcbcd8  EFLAGS: 00010246
2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000
RBX: 1603a07305004fff RCX: ffff88061cbcbd08
2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023
RSI: 1603a07305004fff RDI: 0000000000000000
2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8
R08: 0000000000000023 R09: 0000000000000000
2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000
R11: ffffffffa0082c7f R12: 0000000000000001
2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763
R14: ffff8809b8b01ce0 R15: 0000000000000000
2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS:
0000000000000000(0000) GS:ffff880002040000(0000)
knlGS:0000000000000000
2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000
CR3: 0000000001a42000 CR4: 00000000000026e0
2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid:
2572, threadinfo ffff88061cbca000, task ffff88061cf04650)
2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack:
2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8
ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9
2011-02-09T07:40:15.636718+01:00 phy005 kernel: <0> ffff8809b8b01ce0
ffff880c23984000 ffff88036c471ff8 00000000000001ff
2011-02-09T07:40:15.636721+01:00 phy005 kernel: <0> ffff88061cbcbd58
ffffffffa008363b 0000000000000200 ffff880c23984000
2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace:
2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>]
rmap_remove+0xa3/0x1a0 [kvm]
2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>]
kvm_mmu_zap_page+0x9f/0x299 [kvm]
2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>]
kvm_mmu_zap_all+0x35/0x60 [kvm]
2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>]
kvm_arch_flush_shadow+0x16/0x22 [kvm]
2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>]
kvm_mmu_notifier_release+0x31/0x44 [kvm]
2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>]
__mmu_notifier_release+0x4f/0x7b
2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>]
exit_mmap+0x2c/0x132
2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
apic_timer_interrupt+0xe/0x20
2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>]
sys_exit_group+0x17/0x1b
2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8
01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89
f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d <48> 2b 18 48 c1 e3 03 48
03 58 18 eb 39 41 8d 4c 24 ff be 01 00
2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP
[<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP <ffff88061cbcbcd8>
2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000
2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace
174c28940e9fd0a9 ]---
2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

So it doesn't seem to be a hardware problem since we replaced all that.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux