Hi all,
this came up during discussion on the mailing list (cf thread "Question
on handling managed IRQs when hotplugging CPUs").
The problem is that with managed IRQs and block-mq I/O will be routed to
individual CPUs, and the response will be send to the IRQ assigned to
that CPU.
If now a CPU hotplug event occurs when I/O is still in-flight the IRQ
will _still_ be assigned to the CPU, causing any pending interrupt to be
lost.
Hence the driver will never notice that an interrupt has happened, and
an I/O timeout occurs.
One proposal was to quiesce the device when a CPU hotplug event occurs,
and only allow for CPU hotplugging once it's fully quiesced.
While this would be working, it will be introducing quite some system
stall, and it actually a rather big impact in the system.
Another possiblity would be to have the driver abort the requests
itself, but this requires specific callbacks into the driver, and, of
course, the driver having the ability to actually do so.
I would like to discuss at LSF/MM how these issues can be addressed best.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)