Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2017-11-30 03:59 PM, David Hill wrote:


On 2017-11-30 03:52 PM, David Hill wrote:


On 2017-11-29 09:42 PM, Jason Wang wrote:


On 2017年11月30日 03:13, David Hill wrote:


On 2017-11-29 12:15 AM, Jason Wang wrote:


On 2017年11月29日 10:52, Dave Hill wrote:


Thanks. Zerocopy is disabled by several distribution by default. For upstream, the only reason to let it on is to hope more developers can help and fix the issues.


So I never hit this issue with previous kernel and this issue started happening with the v4.14-rc series.


Right, this still need to be investigated if it was introduced recently.

Looking at git history, the only suspected commit is for 4.14 is

commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62
Author: Willem de Bruijn <willemb@xxxxxxxxxx>
Date:   Fri Oct 6 13:22:31 2017 -0400

    vhost_net: do not stall on zerocopy depletion

Maybe you can try to revert it and see.

If it does not solve your issue, I suspect there's bug elsewhere that cause a packet to be held for very long time.

  I'm using rawhide so perhaps this is why it isn't disabled by default but I have to mention it's an update of FC25 up to FC28 and it never got disabled. Perhaps it should be disabled in Fedora too if it's not the case... I'm not sure this is the place to discuss this ... is it?

Probably not, but I guess Fedora tries to use new technology aggressively.

Thanks

I can revert that commit in 4.15-rc1 but I can't find it in 4.14.2 ...  Is there another commit that could affect this ?

My bad, the suspicious is then:

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")
c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct ubuf_info)->refcnt to refcount_t")

Thanks


Reverting those two commits breaks kernel compilation:

net/core/dev.c: In function ‘dev_queue_xmit_nit’:
net/core/dev.c:1952:8: error: implicit declaration of function ‘skb_orphan_frags_rx’; did you mean ‘skb_orphan_frags’? [-Werror=implicit-function-declaration]
   if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC))
        ^~~~~~~~~~~~~~~~~~~
        skb_orphan_frags


I changed skb_orphan_frags_rx to skb_orphan_frags and it compiled but will everything blow up?

Thanks,
Dave

Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e too ... compiling and I'll keep you posted.

So I'm still able to reproduce this issue even with reverting these 3 commits.  Would you have other suspect commits ?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux