Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you tried adding this:


cat<<EOF>/etc/modprobe.d/vhost-net.conf
options vhost_net  experimental_zcopytx=0
EOF

reboot


Other than this, you can try bisecting but in my case, the system wont boot when reaching a given commit.



On 2017-12-02 11:37 AM, Harald Moeller wrote:
Hello, my name is Harry and this is my first post here, hope I'm doing this the right way, sorry if not ...

I'm not a subscriber to the full list yet so I understand I shall ask you to be personally CCed.

I am following this as I do experience the same (or sort-a same) issue with 4.14.2.

My setup is more simple, just an oVirt host shutting down some VMs. Doesn't happen all the time but I'd say around 3 from 10.

This is what I see (slightly different from David):

Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 blocked for more than 120 seconds. Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          I     4.14.2-1.el7.hakimo.x86_64 #4 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0 1173      1 0x00000084
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net] Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? remove_wait_queue+0x60/0x60 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0 [vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: entry_SYSCALL64_slow_path+0x25/0x25
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 000055abaa2d29c0 RCX: 00007fb8862d1107 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 000000004008af30 RDI: 0000000000000028 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 000055aba805e10f R09: 00000000ffffffff Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 000055ababf32510 Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 000055ababf32498 R15: 000055abaa2a0b40

This is still happening after reverting the three suggested commits

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")

c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct ubuf_info)->refcnt to refcount_t")

581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on stand-alone ptype in dev_queue_xmit_nit"}

Anything I could be helpful with trying to solve this? Any more info I could provide?

Harry





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux