On Mon, Aug 22, 2022 at 09:22:39AM -0700, Eric Dumazet wrote: > On Mon, Aug 22, 2022 at 2:10 AM Peilin Ye <yepeilin.cs@xxxxxxxxx> wrote: > > > > From: Peilin Ye <peilin.ye@xxxxxxxxxxxxx> > > > > Hi all, > > > > Currently sockets (especially UDP ones) can drop a lot of packets at TC > > egress when rate limited by shaper Qdiscs like HTB. This patchset series > > tries to solve this by introducing a Qdisc backpressure mechanism. > > > > RFC v1 [1] used a throttle & unthrottle approach, which introduced several > > issues, including a thundering herd problem and a socket reference count > > issue [2]. This RFC v2 uses a different approach to avoid those issues: > > > > 1. When a shaper Qdisc drops a packet that belongs to a local socket due > > to TC egress congestion, we make part of the socket's sndbuf > > temporarily unavailable, so it sends slower. > > > > 2. Later, when TC egress becomes idle again, we gradually recover the > > socket's sndbuf back to normal. Patch 2 implements this step using a > > timer for UDP sockets. > > > > The thundering herd problem is avoided, since we no longer wake up all > > throttled sockets at the same time in qdisc_watchdog(). The socket > > reference count issue is also avoided, since we no longer maintain socket > > list on Qdisc. > > > > Performance is better than RFC v1. There is one concern about fairness > > between flows for TBF Qdisc, which could be solved by using a SFQ inner > > Qdisc. > > > > Please see the individual patches for details and numbers. Any comments, > > suggestions would be much appreciated. Thanks! > > > > [1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@xxxxxxxxxxxxx/ > > [2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/ > > > > Peilin Ye (5): > > net: Introduce Qdisc backpressure infrastructure > > net/udp: Implement Qdisc backpressure algorithm > > net/sched: sch_tbf: Use Qdisc backpressure infrastructure > > net/sched: sch_htb: Use Qdisc backpressure infrastructure > > net/sched: sch_cbq: Use Qdisc backpressure infrastructure > > > > I think the whole idea is wrong. > Be more specific? > Packet schedulers can be remote (offloaded, or on another box) This is not the case we are dealing with (yet). > > The idea of going back to socket level from a packet scheduler should > really be a last resort. I think it should be the first resort, as we should backpressure to the source, rather than anything in the middle. > > Issue of having UDP sockets being able to flood a network is tough, I > am not sure the core networking stack > should pretend it can solve the issue. It seems you misunderstand it here, we are not dealing with UDP on the network, just on an end host. The backpressure we are dealing with is from Qdisc to socket on _TX side_ and on one single host. > > Note that FQ based packet schedulers can also help already. It only helps TCP pacing. Thanks.