On 2024-02-16 01:23, Eric Dumazet wrote: > !-------------------------------------------------------------------| > This Message Is From an External Sender > > |-------------------------------------------------------------------! > > On Fri, Feb 16, 2024 at 6:31 AM Tejun Heo <tj@xxxxxxxxxx> wrote: >> >> Hello, >> >> On Mon, Jan 29, 2024 at 11:11:53PM -1000, Tejun Heo wrote: >>> The only generic interface to execute asynchronously in the BH context is >>> tasklet; however, it's marked deprecated and has some design flaws. To >>> replace tasklets, BH workqueue support was recently added. A BH workqueue >>> behaves similarly to regular workqueues except that the queued work items >>> are executed in the BH context. >>> >>> This patch converts TCP Small Queues implementation from tasklet to BH >>> workqueue. >>> >>> Semantically, this is an equivalent conversion and there shouldn't be any >>> user-visible behavior changes. While workqueue's queueing and execution >>> paths are a bit heavier than tasklet's, unless the work item is being queued >>> every packet, the difference hopefully shouldn't matter. >>> >>> My experience with the networking stack is very limited and this patch >>> definitely needs attention from someone who actually understands networking. >> >> On Jakub's recommendation, I asked David Wei to perform production memcache >> benchmark on the backported conversion patch. There was no discernible >> difference before and after. Given that this is likely as hot as it gets for >> the path on a real workloal, the conversions shouldn't hopefully be >> noticeable in terms of performance impact. >> >> Jakub, I'd really appreciate if you could ack. David, would it be okay if I >> add your Tested-by? Yes, that's fine. > > I presume memcache benchmark is using small RPC ? It is not a benchmark but a prod shadow, but yes the requests are small. > > TSQ matters for high BDP, and is very time sensitive. > > Things like slow TX completions (firing from napi poll, BH context) > can hurt TSQ. > > If we add on top of these slow TX completions, an additional work > queue overhead, I really am not sure... > > I would recommend tests with pfifo_fast qdisc (not FQ which has a > special override for TSQ limits) > > Eventually we could add in TCP a measure of the time lost because of > TSQ, regardless of the kick implementation (tasklet or workqueue). > Measuring the delay between when a tcp socket got tcp_wfree approval > to deliver more packets, and time it finally delivered these packets > could be implemented with a bpftrace program.