Re: VM started to hang after a system update

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Mon, 29 Jul 2013 13:58:54 +0300



https://bugzilla.redhat.com/show_bug.cgi?id=975065 ?

On Mon, Jul 29, 2013 at 11:02:01AM +0200, Artur Samborski wrote:
> Hello,
> 
> we have another problem with KVM on our production machines.
> 
> After updating the OS (Fedora Core 18) our KVM virtual machines
> started to crash. Test have shown that this crashes are associated
> with occurrence of a large load of network traffic.
> 
> When the virtual machine hangs, this message appears in the KVM-host
> kernel (3.9.9-201.fc18.x86_64) log:
> 
> 
>  BUG: unable to handle kernel NULL pointer dereference at           (null)
>  IP: [<ffffffff81141af1>] put_page+0x11/0x60
>  PGD 0
>  Oops: 0000 [#1] SMP
>  Modules linked in: binfmt_misc ip6table_filter ip6_tables
> ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> xt_CHECKSUM iptable_mangle bridge stp llc be2iscsi iscsi_boot_sysfs
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser
> rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi e1000e iTCO_wdt
> iTCO_vendor_support ptp pps_core vhost_net ses ioatdma dcdbas mperf
> shpchp i7core_edac lpc_ich edac_core dca mfd_core tun macvtap
> macvlan enclosure bnx2 coretemp crc32c_intel serio_raw microcode
> kvm_intel acpi_power_meter kvm ipmi_devintf ipmi_si ipmi_msghandler
> mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core megaraid_sas
> wmi
>  CPU 2
>  Pid: 7524, comm: vhost-7521 Tainted: G        W I
> 3.9.9-201.fc18.x86_64 #1 Dell Inc. PowerEdge R610/0K399H
>  RIP: 0010:[<ffffffff81141af1>]  [<ffffffff81141af1>] put_page+0x11/0x60
>  RSP: 0018:ffff880427a31c28  EFLAGS: 00010296
>  RAX: ffff88065d8e16c0 RBX: 0000000000000000 RCX: 0000000000000006
>  RDX: 0000000000000150 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: ffff880427a31c38 R08: 000000000000000a R09: 00000000000006f7
>  R10: 0000000000000000 R11: 00000000000006f6 R12: ffff8808273c9d00
>  R13: ffffffffa0180237 R14: ffff88067c8d43d8 R15: ffff8808273c9d00
>  FS:  0000000000000000(0000) GS:ffff88083fc20000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000000000000000 CR3: 000000082864a000 CR4: 00000000000027e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Process vhost-7521 (pid: 7524, threadinfo ffff880427a30000, task
> ffff880427f94650)
>  Stack:
>   ffffea001c159340 0000000000000013 ffff880427a31c58 ffffffff8154676f
>   ffff8808273c9d00 ffff8808273c9d00 ffff880427a31c78 ffffffff8154680e
>   ffffea001c15b080 ffff880828f45800 ffff880427a31ca8 ffffffff815468c6
>  Call Trace:
>   [<ffffffff8154676f>] skb_release_data+0x8f/0x110
>   [<ffffffff8154680e>] __kfree_skb+0x1e/0xa0
>   [<ffffffff815468c6>] kfree_skb+0x36/0xa0
>   [<ffffffffa0180237>] macvtap_get_user+0x317/0x510 [macvtap]
>   [<ffffffffa018045b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
>   [<ffffffffa0258db7>] handle_tx+0x287/0x680 [vhost_net]
>   [<ffffffffa02591e5>] handle_tx_kick+0x15/0x20 [vhost_net]
>   [<ffffffffa025595d>] vhost_worker+0xed/0x190 [vhost_net]
>   [<ffffffffa0255870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
>   [<ffffffff81082ba0>] kthread+0xc0/0xd0
>   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0
>   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
>   [<ffffffff8166af2c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
>  Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f
> 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08
> <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f
>  RIP  [<ffffffff81141af1>] put_page+0x11/0x60
>   RSP <ffff880427a31c28>
>  CR2: 0000000000000000
>  ---[ end trace cb305c3097c1de97 ]---
> 
> 
> After returning to the previously working kernel (3.7.0 -- manually
> compiled from kvm git sources) - the problem still persists:
> 
> 
>  BUG: unable to handle kernel paging request at 0000040200000401
>  IP: [<ffffffff8113e445>] put_page+0x5/0x50
>  PGD 0
>  Oops: 0000 [#1] SMP
>  Modules linked in: binfmt_misc ip6table_filter ip6_tables
> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack xt_CHECKSUM iptable_mangle be2iscsi iscsi_boot_sysfs
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser
> rdma_cm ib_addr iw_cm ib_cm ib_sa bridge stp llc ib_mad ib_core
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net
> coretemp e1000e ioatdma tun macvtap macvlan bnx2 iTCO_wdt shpchp
> crc32c_intel microcode dca ses iTCO_vendor_support lpc_ich wmi
> dcdbas kvm_intel i7core_edac edac_core enclosure joydev
> acpi_power_meter serio_raw pcspkr mfd_core kvm ipmi_devintf ipmi_si
> ipmi_msghandler megaraid_sas
>  CPU 0
>  Pid: 1505, comm: vhost-1502 Tainted: G        W    3.7.0HYDRA_02+
> #1 Dell Inc. PowerEdge R610/0K399H
>  RIP: 0010:[<ffffffff8113e445>]  [<ffffffff8113e445>] put_page+0x5/0x50
>  RSP: 0018:ffff880823e6bc50  EFLAGS: 00010202
>  RAX: ffff88066d34bec0 RBX: 0000000000000012 RCX: ffffea0019cf001c
>  RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000040200000401
>  RBP: ffff880823e6bc68 R08: ffff880823e444f8 R09: 0000000000000010
>  R10: 0000000000000000 R11: 00003ffffffff000 R12: ffff880827d34700
>  R13: ffffffffa01371a8 R14: ffff880823e443d8 R15: ffff880827d34700
>  FS:  0000000000000000(0000) GS:ffff88083fc00000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000040200000401 CR3: 0000000825272000 CR4: 00000000000027e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Process vhost-1502 (pid: 1505, threadinfo ffff880823e6a000, task
> ffff880825a1c5c0)
>  Stack:
>   ffffffff81520c1f ffff880827d34700 ffff880827d34700 ffff880823e6bc88
>   ffffffff81520cbe ffffea0019cf69c0 ffff88042a1df400 ffff880823e6bcb8
>   ffffffff81520d76 000000000000000c ffff88042a1df400 000000000000000c
>  Call Trace:
>   [<ffffffff81520c1f>] ? skb_release_data+0x8f/0x110
>   [<ffffffff81520cbe>] __kfree_skb+0x1e/0xa0
>   [<ffffffff81520d76>] kfree_skb+0x36/0xa0
>   [<ffffffffa01371a8>] macvtap_get_user+0x248/0x490 [macvtap]
>   [<ffffffffa013741b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
>   [<ffffffffa0165d2a>] handle_tx+0x28a/0x680 [vhost_net]
>   [<ffffffffa0166155>] handle_tx_kick+0x15/0x20 [vhost_net]
>   [<ffffffffa016295d>] vhost_worker+0xed/0x190 [vhost_net]
>   [<ffffffffa0162870>] ? vhost_work_flush+0x110/0x110 [vhost_net]
>   [<ffffffff81081750>] kthread+0xc0/0xd0
>   [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0x50/0xf0
>   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
>   [<ffffffff8163fdac>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81081690>] ? kthread_create_on_node+0x120/0x120
>  Code: fc 00 00 00 00 e8 ac fe ff ff 48 63 45 fc 65 48 01 04 25 b8
> 06 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
>  RIP  [<ffffffff8113e445>] put_page+0x5/0x50
>   RSP <ffff880823e6bc50>
>  CR2: 0000040200000401
> 
> 
> Only after a complete rollback to the previous state of the system -
> everything starts to work properly (the problem disappears).
> Therefore suspicion that it may be associated with same userspace
> tools?
> 
> I will be grateful for any hints.
> 
> Regards,
> Artur Samborski
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html