On Sat, Jul 2, 2022 at 12:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 7/1/22 5:12 PM, Johan Almbladh wrote: > > The byte queue limits (BQL) mechanism is intended to move queuing from > > the driver to the network stack in order to reduce latency caused by > > excessive queuing in hardware. However, when transmitting or redirecting > > a packet using generic XDP, the qdisc layer is bypassed and there are no > > additional queues. Since netif_xmit_stopped() also takes BQL limits into > > account, but without having any alternative queuing, packets are > > silently dropped. > > > > This patch modifies the drop condition to only consider cases when the > > driver itself cannot accept any more packets. This is analogous to the > > condition in __dev_direct_xmit(). Dropped packets are also counted on > > the device. > > > > Bypassing the qdisc layer in the generic XDP TX path means that XDP > > packets are able to starve other packets going through a qdisc, and > > DDOS attacks will be more effective. In-driver-XDP use dedicated TX > > queues, so they do not have this starvation issue. > > > > Signed-off-by: Johan Almbladh <johan.almbladh@xxxxxxxxxxxxxxxxx> > > --- > > net/core/dev.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 8e6f22961206..00fb9249357f 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4863,7 +4863,10 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, > > } > > > > /* When doing generic XDP we have to bypass the qdisc layer and the > > - * network taps in order to match in-driver-XDP behavior. > > + * network taps in order to match in-driver-XDP behavior. This also means > > + * that XDP packets are able to starve other packets going through a qdisc, > > + * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX > > + * queues, so they do not have this starvation issue. > > */ > > void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) > > { > > @@ -4875,10 +4878,12 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) > > txq = netdev_core_pick_tx(dev, skb, NULL); > > cpu = smp_processor_id(); > > HARD_TX_LOCK(dev, txq, cpu); > > - if (!netif_xmit_stopped(txq)) { > > + if (!netif_xmit_frozen_or_drv_stopped(txq)) { > > rc = netdev_start_xmit(skb, dev, txq, 0); > > if (dev_xmit_complete(rc)) > > free_skb = false; > > + } else { > > + dev_core_stats_tx_dropped_inc(dev); > > } > > HARD_TX_UNLOCK(dev, txq); > > if (free_skb) { > > Small q: Shouldn't the drop counter go into the free_skb branch? This was on purpose to not increment the counter twice, but I think you are right. The driver update the tx_dropped counter if the packet is dropped, but I see that it also consumes the skb in those cases. Looking again at the driver tree I cannot found any examples where the driver updates the counter *without* consuming the skb. This logic makes sense - whoever consumes the skb it is also responsible for updating the counters on the netdev. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 00fb9249357f..17e2c39477c5 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4882,11 +4882,10 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) > rc = netdev_start_xmit(skb, dev, txq, 0); > if (dev_xmit_complete(rc)) > free_skb = false; > - } else { > - dev_core_stats_tx_dropped_inc(dev); > } > HARD_TX_UNLOCK(dev, txq); > if (free_skb) { > + dev_core_stats_tx_dropped_inc(dev); > trace_xdp_exception(dev, xdp_prog, XDP_TX); > kfree_skb(skb); > }