Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 3 Sep 2019 10:09:57 +0200 (CEST)

On Tue, 3 Sep 2019, Ming Lei wrote:
> Scheduler can do nothing if the CPU is taken completely by handling
> interrupt & softirq, so seems not a scheduler problem, IMO.

Well, but thinking more about it, the solution you are proposing is more a
bandaid than anything else.

If you look at the networking NAPI mechanism. It handles that situation
gracefully by:

  - Disabling the interrupt at the device level

  - Polling the device in softirq context until empty and then reenabling
    interrupts

  - In case the softirq handles more packets than a defined budget it
    forces the softirq into the softirqd thread context which also
    allows rescheduling once the budget is completed.

With your adhoc workaround you handle one specific case. But it does not
work at all when an overload situation occurs in a case where the queues
are truly per cpu simply. Because then the interrupt and the thread
affinity are the same and single CPU targets and you replace the interrupt
with a threaded handler which runs by default with RT priority.

So instead of hacking something half baken into the hard/softirq code, why
can't block do a budget limitation and once that is reached switch to
something NAPI like as a general solution?

Thanks,

	tglx