On Fri, Dec 18, 2020 at 11:49 AM Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > On 2020-12-18 08:18:05 [-0800], Alison Chaiken wrote: > > > Having a thread to run the tick-timer would avoid that scenario. > > > > > > > Didn't ktimersoftd used to be such a thread? It's still not entirely > > clear to me at least why it was removed. > > Yes, ktimersoftd was such a thread that is why I am suggesting it. It > would be probably just a quick duct tape. > > All of the reasons why it has been introduced disappeared in the > previous softirq rework. The NAPI handover works, posixtimer need no > additional love and so on (the original motivation to have it). > > The problem, that Paul reported, should also exist for !RT with the > `threadirqs' switch (untested but it is the same code). > It is worth noting that in his report the latency was increased because > the timer-tick woke the ksoftirqd thread. Rightfully you could say that > this would not have happen with the timer thread. > However, the usb-storage driver (just to pick an easy to trigger > scenario for my case) also wakes the ksoftirqd if a transfer completes. > If the ethernet interrupt fires before ksoftirqd completes its task then > we have the same situation without the involvement of the timer :) > > > -- Alison Chaiken > > Aurora Innovation > > Sebastian Hi Everyone, Thanks for taking the time to look at this, it's appreciated! For now, setting the ksoftirqd priority high on the same core as the interrupt seems to greatly improve things, thanks for the suggestion Grygorii. A few of other notes on items that seem to affect latency in a related use case. First, if a sibling thread of an application calls clone (e.g. a system() call) then this seems to prevent all the threads of the application from being scheduled temporarily. Second, I saw a couple of instances where one thread seemed to get migrated to another core, alternating with the migration thread (~40 times) and then ultimately running on a different core. Using taskset to set the CPU affinity of the offending thread helped this. Third, the PHC tx timestamping (but not the rx) can cause latency issues (using the macb driver), but this is the least investigated of the group. thanks, Paul