On Fri, May 22, 2020 at 10:39:23AM +0800, Ming Lei wrote: > On Thu, May 21, 2020 at 12:15:52PM -0700, Bart Van Assche wrote: > > On 2020-05-20 21:33, Ming Lei wrote: > > > No. > > > > > > If vector 3 is for covering hw queue 12 ~ 15, the vector shouldn't be > > > shutdown when cpu 14 is offline. > > >> Also I am pretty sure that we don't do this way with managed IRQ. And > > > non-managed IRQ will be migrated to other online cpus during cpu offline, > > > so not an issue at all. See migrate_one_irq(). > > > > Thanks for the pointer to migrate_one_irq(). > > > > However, I'm not convinced the above statement is correct. My > > understanding is that the block driver knows which interrupt vector has > > been associated with which hardware queue but the blk-mq core not. It > > seems to me that patch 6/6 of this series is based on the following > > assumptions: > > (a) That the interrupt that is associated with a hardware queue is > > processed by one of the CPU's in hctx->cpumask. > > (b) That hardware queues do not share interrupt vectors. > > > > I don't think that either assumption is correct. > > What the patch tries to do is just: > > - when the last cpu of hctx->cpumask is going to become offline, mark > this hctx as inactive, then drain any inflight IO requests originated > from this hctx > > The correctness is that once we stops to produce request, we can drain > any in-flight requests before shutdown the last cpu of hctx. Then finally > this hctx becomes quiesced completely. Do you think this way is wrong? > If yes, please prove it. I don't think this applies to what Bart is saying, but there is a pathological case where things break down: if a driver uses managed irq's, but doesn't use the same affinity for the hctx's, an offline cpu may have been the only one providing irq handling for an online hctx. I feel like that should be a driver bug if it were to set itself up that way, but I don't find anything enforces that.