Re: PROBLEM: virtio_net LRO kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >>
> > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > >>>
> > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > >>> > >
> > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > >>> > > > > > Actual changes:
> > >>> > > > > > rx-lro: on [requested off]
> > >>> > > > > > Could not change any device features
> > >>> > > > >
> > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > >>> > > > > which makes it impossible to change the LRO setting.
> > >>> > > > >
> > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > >>> > > >
> > >>> > > > These are VirtualBox machines, which I've been using for years with
> > >>> > > > longterm kernels 4.19, and I never had such a problem.  But now that I
> > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started.  These
> > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > >>> > >
> > >>> > > I would be useful to see the features your virtualbox instance provides
> > >>> > >
> > >>> > > cat /sys/class/net/eth0/device/features
> > >>> >
> > >>> > # cat /sys/class/net/eth0/device/features
> > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > >>>
> > >>> I was able to reproduce the warning but not the panic.
> > >>> OTOH if LRO stays on when enabling forwarding that
> > >>> is already a problem. Any chance you can bisect to
> > >>> find out which change introduced the panic?
> > >>
> > >>
> > >> Any kernels up to 4.19.198 don't panic.
> > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > >> I have not tested any kernels between 4.19 and 5.10.
> > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > >> That may take a day or so.  I'll get on with it now, and report my findings.
> > >
> > > So, I narrowed  it down: the panics start with kernel 5.0-rc.
> >
> > More narowly, the problem seems be coming from commit
> > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > Just to test my suspicion, I deleted a few lines from that code,
> > and the panic went away.  Hope that helps you guys figure out
> > what the problem might be.

Well it disables LRO but we knew this :( I'd help if we knew
where does it panic, all we see it the warning which is
related for sure but not the immediate rootcause ...

> >
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2978,11 +2978,6 @@
> >   }
> >   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >      dev->features |= NETIF_F_RXCSUM;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > -    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > -    dev->features |= NETIF_F_LRO;
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > -    dev->hw_features |= NETIF_F_LRO;
> >
> >   dev->vlan_features = dev->features;
> 
> Just FYI, Google turned up two similar bug reposts...
> Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> 
> Is there any sensible thing I could do, temporarily, until this
> problem is sorted out?
> Or am I simply stuck to kernels 4.19 on these machines for now?


Something like this I guess:


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..cc5982193a40 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
 			__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
 	}
 
+	__virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
+	__virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
 	return 0;
 }
 

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux