Re: ixgbe: xdpsock txonly hangs and does not complete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 11, 2023 at 1:56 PM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> On Wed, 11 Oct 2023 at 08:06, Srivats P <pstavirs@xxxxxxxxx> wrote:
> >
> > Hi,
> >
> > While debugging a TX problem with my AF_XDP app it seems that there
> > might be a bug in the ixgbe driver (see thread:
> > https://www.spinics.net/lists/xdp-newbies/msg02406.html)
> >
> > So I decided to try xdpsock txonly and I see a similar behaviour as my
> > app with xdpsock as well.
> >
> > Essentially, after sending 'n' packets on Tx ring, the app (or xdpsock
> > for that matter) expects to free 'n' packets from the completion ring,
> > but it either gets less or sometimes more packets/descriptors from the
> > completion ring.
> >
> > Please see the log below.
> >
> > <log>
> > root@tditwtga002:~# ./xdpsock -t -i eno49
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 7,388,073      7,390,400
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,689,548      17,082,880
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,660,591      26,745,152
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,690,513      36,437,184
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,684,898      46,123,840
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,688,898      55,815,168
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,654,194      65,471,488
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,681,793      75,154,880
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,664,164      84,821,376
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,681,688      94,504,768
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,637,019      104,143,552
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 9,684,780      113,830,720
> > ^Coutstanding 2466 (-64)
> > outstanding 2402 (-64)
> > outstanding 2338 (-64)
> > outstanding 2274 (-64)
> > outstanding 2210 (-64)
> > outstanding 2146 (-64)
> > outstanding 2082 (-64)
> > outstanding 2020 (-62)
> > outstanding 1956 (-64)
> > outstanding 1892 (-64)
> > outstanding 1828 (-64)
> > outstanding 1764 (-64)
> > outstanding 1700 (-64)
> > outstanding 1636 (-64)
> > outstanding 1572 (-64)
> > outstanding 1510 (-62)
> > outstanding 1446 (-64)
> > outstanding 1382 (-64)
> > outstanding 1318 (-64)
> > outstanding 1254 (-64)
> > outstanding 1190 (-64)
> > outstanding 1126 (-64)
> > outstanding 1062 (-64)
> > outstanding 1000 (-62)
> > outstanding 936 (-64)
> > outstanding 872 (-64)
> > outstanding 808 (-64)
> > outstanding 744 (-64)
> > outstanding 680 (-64)
> > outstanding 616 (-64)
> > outstanding 552 (-64)
> > outstanding 490 (-62)
> > outstanding 426 (-64)
> > outstanding 362 (-64)
> > outstanding 298 (-64)
> > outstanding 234 (-64)
> > outstanding 170 (-64)
> > outstanding 106 (-64)
> > outstanding 42 (-64)
> > outstanding 1 (-41)
> >
> >  sock0@eno49:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 5,625,834      119,457,536
> >
> > ^C
> >
> > ^C
> >
> > ^C
> > ^[[A^H^H^H^H^C
> > ^C
> > ^C
> > ^C^Z
> > [1]+  Stopped                 ./xdpsock -t -i eno49
> > root@tditwtga002:~#
> > </log>
> >
> > As you can see, the code gets into a infinite loop waiting for 1
> > descriptor which never appears on the completion ring.
> >
> > Note that I added the code to print the outstanding count and also
> > call complete_tx_only_all() in case of Ctrl-C also and not just when
> > number of packets were specified. Here's the exact diff (the sample
> > code was taken from 5.15 Linux kernel tree) -
> >
> > <diff>
> > --- xdpsock_user.c.orig 2023-10-11 11:20:47.553580604 +0530
> > +++ xdpsock_user.c      2023-10-07 12:18:33.849399960 +0530
> > @@ -1174,7 +1174,7 @@ static inline void complete_tx_l2fwd(str
> >  }
> >
> >  static inline void complete_tx_only(struct xsk_socket_info *xsk,
> > -                                   int batch_size)
> > +                                   int batch_size, bool print)
> >  {
> >         unsigned int rcvd;
> >         u32 idx;
> > @@ -1191,6 +1191,9 @@ static inline void complete_tx_only(stru
> >         if (rcvd > 0) {
> >                 xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> >                 xsk->outstanding_tx -= rcvd;
> > +                if (print)
> > +                    fprintf(stderr, "outstanding %u (-%02u) \n",
> > +                            xsk->outstanding_tx, rcvd);
> >         }
> >  }
> >
> > @@ -1271,7 +1274,7 @@ static void tx_only(struct xsk_socket_in
> >
> >         while (xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx) <
> >                                       batch_size) {
> > -               complete_tx_only(xsk, batch_size);
> > +               complete_tx_only(xsk, batch_size, false);
> >                 if (benchmark_done)
> >                         return;
> >         }
> > @@ -1288,7 +1291,7 @@ static void tx_only(struct xsk_socket_in
> >         xsk->outstanding_tx += batch_size;
> >         *frame_nb += batch_size;
> >         *frame_nb %= NUM_FRAMES;
> > -       complete_tx_only(xsk, batch_size);
> > +       complete_tx_only(xsk, batch_size, false);
> >  }
> >
> >  static inline int get_batch_size(int pkt_cnt)
> > @@ -1311,7 +1314,7 @@ static void complete_tx_only_all(void)
> >                 pending = false;
> >                 for (i = 0; i < num_socks; i++) {
> >                         if (xsks[i]->outstanding_tx) {
> > -                               complete_tx_only(xsks[i], opt_batch_size);
> > +                               complete_tx_only(xsks[i], opt_batch_size, true);
> >                                 pending = !!xsks[i]->outstanding_tx;
> >                         }
> >                 }
> > @@ -1353,7 +1356,9 @@ static void tx_only_all(void)
> >                         break;
> >         }
> >
> > +#if 0
> >         if (opt_pkt_count)
> > +#endif
> >                 complete_tx_only_all();
> >  }
> > </diff>
> >
> > Distro/Kernel:
> > Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-86-generic x86_64)
>
> Thanks for reporting. Could you please try with a bleeding edge kernel
> please and check if the problem is still there? 6.5 for example.

Unfortunately, I don't have an ixgbe NIC. The problem was reported by
my app's customer. They have rolled back to an older driver version
(that ships as part of the native Ubuntu 22.04 Linux Kernel). The
reason they had upgraded the driver was because with the native driver
they were running into the ixgbe limitation of not being able to
support more than 64 cores - which they have now resolved by disabling
hyper-threading). After disabling hyper threading and rolling back to
the native Ubuntu 22.04 ixgbe driver both xdpsock and our app is
working fine.

Srivats

>
> > Driver:
> > # ethtool -i eno49
> > driver: ixgbe
> > version: 5.19.6
> > firmware-version: 0x80000887, 1.2688.0
> > expansion-rom-version:
> > bus-info: 0000:04:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> >
> > I would be grateful if someone can try the same and let me know if
> > they see a similar behaviour.
> >
> > Thanks in advance,
> > Srivats




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux