On 11/01/2017 01:47 AM, Thomas Gleixner wrote: > On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote: > >> In managed-interrupts case, interrupts which were affine to the offlined >> CPU is not getting migrated to another available CPU. But the >> documentation at below link says that "all interrupts" are migrated to a >> new CPU. So not all interrupts are getting migrated to a new CPU then. > > Correct. > >> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin >> e-case >> "- All interrupts targeted to this CPU are migrated to a new CPU" > > Well, documentation is not always up to date :) > >> Once the last CPU in the affinity mask is offlined and a particular IRQ >> is shutdown, is there a way currently for the device driver to get >> callback to complete all outstanding requests on that queue? > > No and I have no idea how the other drivers deal with that. > > The way you can do that is to have your own hotplug callback which is > invoked when the cpu goes down, but way before the interrupt is shut down, > which is one of the last steps. Ideally this would be a callback in the > generic block code which then calls out to all instances like its done for > the cpu dead state. > In principle, yes, that would be (and, in fact, might already) moved to the block layer for blk-mq, as this has full control over the individual queues and hence can ensure that the queues with dead/removed CPUs are properly handled. Here, OTOH, we are dealing with the legacy sq implementation (or, to be precised, a blk-mq implementation utilizing only a single queue), so that any of this handling need to be implemented in the driver. So what would need to be done here is to implement a hotplug callback in the driver, which would disable the CPU from the list/bitmap of valid cpus. Then the driver could validate the CPU number with this bitmap upon I/O submission (instead of just using raw_smp_cpu_number()), and could set the queue ID to '0' if an invalid CPU was found. With that the driver should be able to ensure that no new I/O will be submitted which will hit the dead CPU, so with a bit of luck this might already solve the problem. Alternatively I could resurrect my patchset converting the driver to blk-mq, which got vetoed the last time ... Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)