Re: [PATCH bpf v2] xdp: Fix spurious packet loss in generic XDP TX path

Johan Almbladh <johan.almbladh@xxxxxxxxxxxxxxxxx> · Sat, 2 Jul 2022 06:39:01 +0200

On Sat, Jul 2, 2022 at 12:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>
> On 7/1/22 5:12 PM, Johan Almbladh wrote:
> > The byte queue limits (BQL) mechanism is intended to move queuing from
> > the driver to the network stack in order to reduce latency caused by
> > excessive queuing in hardware. However, when transmitting or redirecting
> > a packet using generic XDP, the qdisc layer is bypassed and there are no
> > additional queues. Since netif_xmit_stopped() also takes BQL limits into
> > account, but without having any alternative queuing, packets are
> > silently dropped.
> >
> > This patch modifies the drop condition to only consider cases when the
> > driver itself cannot accept any more packets. This is analogous to the
> > condition in __dev_direct_xmit(). Dropped packets are also counted on
> > the device.
> >
> > Bypassing the qdisc layer in the generic XDP TX path means that XDP
> > packets are able to starve other packets going through a qdisc, and
> > DDOS attacks will be more effective. In-driver-XDP use dedicated TX
> > queues, so they do not have this starvation issue.
> >
> > Signed-off-by: Johan Almbladh <johan.almbladh@xxxxxxxxxxxxxxxxx>
> > ---
> >   net/core/dev.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8e6f22961206..00fb9249357f 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4863,7 +4863,10 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> >   }
> >
> >   /* When doing generic XDP we have to bypass the qdisc layer and the
> > - * network taps in order to match in-driver-XDP behavior.
> > + * network taps in order to match in-driver-XDP behavior. This also means
> > + * that XDP packets are able to starve other packets going through a qdisc,
> > + * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
> > + * queues, so they do not have this starvation issue.
> >    */
> >   void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
> >   {
> > @@ -4875,10 +4878,12 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
> >       txq = netdev_core_pick_tx(dev, skb, NULL);
> >       cpu = smp_processor_id();
> >       HARD_TX_LOCK(dev, txq, cpu);
> > -     if (!netif_xmit_stopped(txq)) {
> > +     if (!netif_xmit_frozen_or_drv_stopped(txq)) {
> >               rc = netdev_start_xmit(skb, dev, txq, 0);
> >               if (dev_xmit_complete(rc))
> >                       free_skb = false;
> > +     } else {
> > +             dev_core_stats_tx_dropped_inc(dev);
> >       }
> >       HARD_TX_UNLOCK(dev, txq);
> >       if (free_skb) {
>
> Small q: Shouldn't the drop counter go into the free_skb branch?

This was on purpose to not increment the counter twice, but I think
you are right. The driver update the tx_dropped counter if the packet
is dropped, but I see that it also consumes the skb in those cases.
Looking again at the driver tree I cannot found any examples where the
driver updates the counter *without* consuming the skb. This logic
makes sense - whoever consumes the skb it is also responsible for
updating the counters on the netdev.

>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 00fb9249357f..17e2c39477c5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4882,11 +4882,10 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
>                  rc = netdev_start_xmit(skb, dev, txq, 0);
>                  if (dev_xmit_complete(rc))
>                          free_skb = false;
> -       } else {
> -               dev_core_stats_tx_dropped_inc(dev);
>          }
>          HARD_TX_UNLOCK(dev, txq);
>          if (free_skb) {
> +               dev_core_stats_tx_dropped_inc(dev);
>                  trace_xdp_exception(dev, xdp_prog, XDP_TX);
>                  kfree_skb(skb);
>          }