Re: PROBLEM: virtio_net LRO kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2021/8/11 下午3:39, Michael S. Tsirkin 写道:
On Wed, Aug 11, 2021 at 11:38:59AM +0800, Jason Wang wrote:
On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
在 2021/7/23 上午10:54, Ivan 写道:
On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
Does it work if you turn off lro before enabling the forwarding?
0 root@NuRaid:~# ethtool -K eth0 lro off
Actual changes:
rx-lro: on [requested off]
Could not change any device features
Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
which makes it impossible to change the LRO setting.

Did you use qemu? If yes, what's the qemu version you've used?
These are VirtualBox machines, which I've been using for years with
longterm kernels 4.19, and I never had such a problem.  But now that I
tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
are just generic kernel builds, and a minimalistic userspace.
I would be useful to see the features your virtualbox instance provides

cat /sys/class/net/eth0/device/features
# cat /sys/class/net/eth0/device/features
1100010110111011111100000000000000000000000000000000000000000000
I was able to reproduce the warning but not the panic.
OTOH if LRO stays on when enabling forwarding that
is already a problem. Any chance you can bisect to
find out which change introduced the panic?

Any kernels up to 4.19.198 don't panic.
Any kernels 5.10+ panic immediately upon starting forwarding.
I have not tested any kernels between 4.19 and 5.10.
I guess I can build a few kernels inbetween, and try pinpoint where it starts.
That may take a day or so.  I'll get on with it now, and report my findings.
So, I narrowed  it down: the panics start with kernel 5.0-rc.
More narowly, the problem seems be coming from commit
a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
Just to test my suspicion, I deleted a few lines from that code,
and the panic went away.  Hope that helps you guys figure out
what the problem might be.
Well it disables LRO but we knew this :( I'd help if we knew
where does it panic, all we see it the warning which is
related for sure but not the immediate rootcause ...

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2978,11 +2978,6 @@
   }
   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
      dev->features |= NETIF_F_RXCSUM;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
-    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
-    dev->features |= NETIF_F_LRO;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
-    dev->hw_features |= NETIF_F_LRO;

   dev->vlan_features = dev->features;
Just FYI, Google turned up two similar bug reposts...
Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593

Is there any sensible thing I could do, temporarily, until this
problem is sorted out?
Or am I simply stuck to kernels 4.19 on these machines for now?

Something like this I guess:


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..cc5982193a40 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
                         __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
         }

+       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
+       __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
         return 0;
  }
When I apply your patch, then I see drastic (more than half)
reductions in speed. (confirmed with iperf).

But if instead I just remove a few lines from commit
a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
as in my earlier post, then I'm back to full speed

I understand that this is just temporary workaround, until we figure this out.

Oh weird. So it's not about getting some weird LRO packet. We will get it with
VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
features.

How about this then? Just pretend to Linux that we disabled LRO.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..8e7e4cea176b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
                                    ~GUEST_OFFLOAD_LRO_MASK;

                 err = virtnet_set_guest_offloads(vi, offloads);
-               if (err)
-                       return err;
+               WARN_ON(err);
+               //if (err)
+               //      return err;
                 vi->guest_offloads = offloads;
         }
No. With this applied, the problem persists:

# echo "1" > /proc/sys/net/ipv4/ip_forward

kernel: ------------[ cut here ]------------
kernel: netdevice: eth0: failed to disable LRO!
kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
dev_disable_lro+0x108/0x150
kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
hid_generic usbhid hid virtio_net net_failover failover aesni_intel
libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
kernel: RIP: 0010:dev_disable_lro+0x108/0x150
Again the warning isn't a big deal. I agree we should address - Jason
any update?
I still think using NETIF_F_LRO might not be correct. Since we're
basically receiving GSO packets.

And it might cause a lot of issues if the device doesn't have
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.

I see two possible fixes:

1) using NETIF_F_GRO_HW instead (the patch is attached)
It's unfortunate you didn't inline. Anyway.
Ivan could you test the patch and report?

or
Hmm. I am not sure we always preserve the GRO_HW requirement that
packets can be re-segmented to reconstruct the original packet stream.
Do all backends guarantee this?


I think we can't.


Could you explain why?


Or we probably need another new netdev feature like rx-gso?





2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS

Thanks

This one would slow guests on old hosts down significantly.


Actually, it's not this proposal but see below.



I am not sure why this didn't trigger previously


It looks to me it was caused by a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 ("virtio-net: ethtool configurable LRO").

Before this commit we won't even advertise NETIF_F_LRO, so dev_disable_lro() won't warn.

After this commit, we advertise LRO and dev_disable_lro() will try to disable all guest offloads which will:

1) slow the traffic

and

2) warn if "lro" can't be disabled on the device without ctrl guest offloads (e.g the virtualbox host)

Thanks


btw -
we used not to have CTRL_GUEST_OFFLOADS after all.



But the main issue is you lose connectivity. That still
persists with this? Can't you get a serial connection
out? I know qemu Did the kernel oops afterwards?

--
MST




_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux