On Tue, Feb 5, 2019 at 11:30 PM Hannes Reinecke <hare@xxxxxxx> wrote: > > Hi all, > > this came up during discussion on the mailing list (cf thread "Question > on handling managed IRQs when hotplugging CPUs"). > The problem is that with managed IRQs and block-mq I/O will be routed to > individual CPUs, and the response will be send to the IRQ assigned to > that CPU. > > If now a CPU hotplug event occurs when I/O is still in-flight the IRQ > will _still_ be assigned to the CPU, causing any pending interrupt to be > lost. > Hence the driver will never notice that an interrupt has happened, and > an I/O timeout occurs. Lots of driver's timeout handler only returns BLK_EH_RESET_TIMER, and this situation can't be covered by IO timeout for these devices. For example, we have see IO hang issue on HPSA, megaraid_sas before when wrong msi vector is set on IO command. Even one such issue on aacraid isn't fixed yet. > > One proposal was to quiesce the device when a CPU hotplug event occurs, > and only allow for CPU hotplugging once it's fully quiesced. That is the original solution, but big problem is that queue dependency exists, such as loop/DM's queue depends on underlying's queue, NVMe IO queue depends on its admin queue. > > While this would be working, it will be introducing quite some system > stall, and it actually a rather big impact in the system. > Another possiblity would be to have the driver abort the requests > itself, but this requires specific callbacks into the driver, and, of > course, the driver having the ability to actually do so. > > I would like to discuss at LSF/MM how these issues can be addressed best. One related topic is that the current static queue mapping without CPU hotplug handler involved may waste lots of IRQ vectors[1], and how to deal with this problem? [1] http://lists.infradead.org/pipermail/linux-nvme/2019-January/021961.html Thanks, Ming Lei