On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has > shown up multiple times in various 'regressions' > simply because it could surface the problem more often. > But even if you revert it, you can still make the faulty > driver/subsystem misbehave by adding more stress to the cpu handling > the IRQ. ..but that's always true. People sometimes live on the edge - often by design (ie hardware has been designed/selected to be the crappiest possible that still work). That doesn't change anything. A patch that takes "bad things can happen" to "bad things DO happen" is a bad patch. > Maybe the answer is to tune the kernel for small latencies at the > price of small throughput (situation before the patch) Generally we always want to tune for latency. Throughput is "easy", but almost never interesting. Sure, people do batch jobs. And yes, people often _benchmark_ throughput, because it's easy to benchmark. It's much harder to benchmark latency, even when it's often much more important. A prime example is the SSD benchmarks in the last few years - they improved _dramatically_ when people noticed that the real problem was latency, not the idiotic maximum big-block bandwidth numbers that have almost zero impact on most people. Put another way: we already have a very strong implicit bias towards bandwidth just because it's easier to see and measure. That means that we generally should strive to have a explicit bias towards optimizing for latency when that choice comes up. Just to balance things out (and just to not take the easy way out: bandwidth can often be improved by adding more layers of buffering and bigger buffers, and that often ends up really hurting latency). > 1) Revert the patch Well, we can revert it only partially - limiting it to just networking for example. Just saying "act the way you used to for tasklets" already seems to have fixed the issue in DVB. > 2) get rid of ksoftirqd since it adds unexpected latencies. We can't get rid of it entirely, since the synchronous softirq code can cause problems too. It's why we have that "maximum of ten synchronous events" in __do_softirq(). And we don't *want* to get rid of it. We've _always_ had that small-scale "at some point we can't do it synchronously any more". That is a small-scale "don't have horrible latency for _other_ things" protection. So it's about latency too, it's just about protecting latency of the rest of the system. The problem with commit 4cd13c21b207 is that it turns the small-scale latency issues in softirq handling (they get larger latencies for lots of hardware interrupts or even from non-preemptible kernel code) into the _huge_ scale latency of scheduling, and does so in a racy way too. > 3) Let applications that expect to have high throughput make sure to > pin their threads on cpus that are not processing IRQ. > (And make sure to not use irqbalance, and setup IRQ cpu affinities) The only people that really deal in "thoughput only" tend to be the HPC people, and they already do things like this. (The other end of the spectrum is the realtime people that have extreme latency requirements, who do things like that for the reverse reason: keeping one or more CPU's reserved for the particular low-latency realtime job). Linus