On Tue, Sep 03, 2019 at 10:09:57AM +0200, Thomas Gleixner wrote: > On Tue, 3 Sep 2019, Ming Lei wrote: > > Scheduler can do nothing if the CPU is taken completely by handling > > interrupt & softirq, so seems not a scheduler problem, IMO. > > Well, but thinking more about it, the solution you are proposing is more a > bandaid than anything else. > > If you look at the networking NAPI mechanism. It handles that situation > gracefully by: > > - Disabling the interrupt at the device level I guess you mean we disable the interrupt in the softirq context. IO performance could be affected by the extra action of disabling/enabling interrupt every time. IOPS for the discussed device is several millions. > > - Polling the device in softirq context until empty and then reenabling > interrupts blk-mq switches to complete req in interrupt context for avoiding extra performance loss, so switching back to softirq context every time may cause performance regression. > > - In case the softirq handles more packets than a defined budget it > forces the softirq into the softirqd thread context which also > allows rescheduling once the budget is completed. It can be hard to figure out one perfect defined budget. In the patchset of V2[1], IRQF_ONESHOT is applied on the irq thread, and interrupt isn't enabled until the interrupt has been handled in the irq thread context. [1] https://github.com/ming1/linux/commits/v5.3-genirq-for-5.4 The approach in this patchset is actually very similar with the above NAPI based way. The difference is that softirq is avoided, and interrupt is always handled in interrupt context in case that CPU won't be stalled, so performance won't be affected. And we only switch to handle interrupt in thread context if CPU stall is going to happen. > > With your adhoc workaround you handle one specific case. But it does not > work at all when an overload situation occurs in a case where the queues > are truly per cpu simply. There isn't such CPU stall issue in case of single submission vs. single completion, because submission side and completion side share same single CPU, and the submission side will slow down if completion side takes all the CPU. > Because then the interrupt and the thread > affinity are the same and single CPU targets and you replace the interrupt > with a threaded handler which runs by default with RT priority. Even though the threaded handler is RT priority and the thread is run on same CPU with the interrupt, CPU/rcu stall still can be avoided. Also we can switch to use irq affinity for the irq thread instead of effective affinity. > > So instead of hacking something half baken into the hard/softirq code, why > can't block do a budget limitation and once that is reached switch to > something NAPI like as a general solution? Another big reason is that multiple submission vs. single completion isn't common case, I knew that there are only small number of such device, so re-inventing NAPI based approach may takes lots of effort, meantime only small number of devices can get the benefit, not sure if block community would like to consider that. IMO, it might be the simplest generic way to solve the problem from genirq. Thanks, Ming