Re: Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2021 at 6:43 PM Alexander Duyck
<alexander.duyck@xxxxxxxxx> wrote:
>
> On Thu, Apr 22, 2021 at 10:28 PM Magnus Karlsson
> <magnus.karlsson@xxxxxxxxx> wrote:
> >
> > On Thu, Apr 22, 2021 at 5:05 PM Jesper Dangaard Brouer
> > <brouer@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 22 Apr 2021 16:42:23 +0200
> > > Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote:
> > >
> > > > On Thu, 22 Apr 2021 12:24:32 +0200
> > > > Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote:
> > > >
> > > > > On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> > > > > <brouer@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, 21 Apr 2021 16:12:32 +0200
> > > > > > Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote:
> > > > > >
> > > > [...]
> > > > > > > more than I get.
> > > > > >
> > > > > > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > > > > > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > > > > > which contains the i40e/ice/ixgbe bug we fixed earlier.
> > > >
> > > > Something is wrong with i40e, I changed git-tree to net-next (at
> > > > commit 5d869070569a) and XDP seems to have stopped working on i40e :-(
> >
> > Found this out too when switching to the net tree yesterday to work on
> > proper packet drop tracing as you spotted/requested yesterday. The
> > commit below completely broke XDP support on i40e (if you do not run
> > with a zero-copy AF_XDP socket because that path still works). I am
> > working on a fix that does not just revert the patch, but fixes the
> > original problem without breaking XDP. Will post it and the tracing
> > fixes as soon as I can.
> >
> > commit 12738ac4754ec92a6a45bf3677d8da780a1412b3
> > Author: Arkadiusz Kubalewski <arkadiusz.kubalewski@xxxxxxxxx>
> > Date:   Fri Mar 26 19:43:40 2021 +0100
> >
> >     i40e: Fix sparse errors in i40e_txrx.c
> >
> >     Remove error handling through pointers. Instead use plain int
> >     to return value from i40e_run_xdp(...).
> >
> >     Previously:
> >     - sparse errors were produced during compilation:
> >     i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
> >     i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing
> > possible ERR_PTR()
> >
> >     - sk_buff* was used to return value, but it has never had valid
> >     pointer to sk_buff. Returned value was always int handled as
> >     a pointer.
> >
> >     Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
> >     Fixes: 2e6893123830 ("i40e: split XDP_TX tail and XDP_REDIRECT map
> > flushing")
> >     Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@xxxxxxxxx>
> >     Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@xxxxxxxxx>
> >     Tested-by: Dave Switzer <david.switzer@xxxxxxxxx>
> >     Signed-off-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>
>
> Yeah, this patch would horribly break things, especially in the
> multi-buffer case. The idea behind using the skb pointer to indicate
> the error is that it is persistent until we hit the EOP descriptor.
> With that removed you end up mangling the entire list of frames since
> it will start trying to process the next frame in the middle of a
> packet.
>
> >
> > > Renamed subj as this is without this patchset applied.
> > >
> > > > $ uname -a
> > > > Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux
> > > >
> > > > When I load any XDP prog almost no packets are let through:
> > > >
> > > >  [kernel-bpf-samples]$ sudo ./xdp1 i40e2
> > > >  libbpf: elf: skipping unrecognized data section(16) .eh_frame
> > > >  libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
> > > >  proto 17:          1 pkt/s
> > > >  proto 0:          0 pkt/s
> > > >  proto 17:          0 pkt/s
> > > >  proto 0:          0 pkt/s
> > > >  proto 17:          1 pkt/s
> > >
> > > Trying out xdp_redirect:
> > >
> > >  [kernel-bpf-samples]$ sudo ./xdp_redirect i40e2 i40e2
> > >  input: 7 output: 7
> > >  libbpf: elf: skipping unrecognized data section(20) .eh_frame
> > >  libbpf: elf: skipping relo section(21) .rel.eh_frame for section(20) .eh_frame
> > >  libbpf: Kernel error message: XDP program already attached
> > >  WARN: link set xdp fd failed on 7
> > >  ifindex 7:       7357 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       6357 pkt/s
> > >
> > > And then it crash (see below) at page_frag_free+0x31 which calls
> > > virt_to_head_page() with a wrong addr (I guess).  This is called by
> > > i40e_clean_tx_irq+0xc9.
> >
> > Did not see a crash myself, just 4 Kpps. But the rings and DMA
> > mappings got completely mangled by the patch above, so could be the
> > same cause.
>
> Are you running with jumbo frames enabled? I would think this change
> would really blow things up in the jumbo enabled case.

I did not. Just using XDP_DROP or XDP_TX would crash the system just fine.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux