Re: [LSF/MM TOPIC] Handling of managed IRQs when hotplugging CPUs

Ming Lei <tom.leiming@xxxxxxxxx> · Tue, 19 Feb 2019 10:19:51 +0800

On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@xxxxxxx> wrote:
>
> Hi all,
>
> this came up during discussion on the mailing list (cf thread "Question
> on handling managed IRQs when hotplugging CPUs").
> The problem is that with managed IRQs and block-mq I/O will be routed to
> individual CPUs, and the response will be send to the IRQ assigned to
> that CPU.
>
> If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
> will _still_ be assigned to the CPU, causing any pending interrupt to be
> lost.
> Hence the driver will never notice that an interrupt has happened, and
> an I/O timeout occurs.

Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER,
and this situation can't be covered by IO timeout for these devices.

For example, we have see IO hang issue on HPSA, megaraid_sas
before when wrong msi vector is set on IO command. Even one such
issue on aacraid isn't fixed yet.

>
> One proposal was to quiesce the device when a CPU hotplug event occurs,
> and only allow for CPU hotplugging once it's fully quiesced.

That is the original solution, but big problem is that queue dependency
exists, such as loop/DM's queue depends on underlying's queue, NVMe
IO queue depends on  its admin queue.

>
> While this would be working, it will be introducing quite some system
> stall, and it actually a rather big impact in the system.
> Another possiblity would be to have the driver abort the requests
> itself, but this requires specific callbacks into the driver, and, of
> course, the driver having the ability to actually do so.
>
> I would like to discuss at LSF/MM how these issues can be addressed best.

One related topic is that the current static queue mapping without CPU hotplug
handler involved may waste lots of IRQ vectors[1], and how to deal
with this problem?

[1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html

Thanks,
Ming Lei