Hi Jianchao, On Wed, Jan 17, 2018 at 04:09:11PM +0800, jianchao.wang wrote: > Hi ming > > Thanks for your kindly response. > > On 01/17/2018 02:22 PM, Ming Lei wrote: > > This warning can't be removed completely, for example, the CPU figured > > in blk_mq_hctx_next_cpu(hctx) can be put on again just after the > > following call returns and before __blk_mq_run_hw_queue() is scheduled > > to run. > > > > kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, msecs_to_jiffies(msecs)) > We could use cpu_active in __blk_mq_run_hw_queue() to narrow the window. > There is a big gap between cpu_online and cpu_active. rebind_workers is also between them. This warning is harmless, also you can't reproduce it without help of your special patch, I guess, :-) So the window shouldn't be a big deal. But it can be a problem about the delay(msecs_to_jiffies(msecs)) passed to kblockd_mod_delayed_work_on(), because during the period: 1) hctx->next_cpu can become online from offline before __blk_mq_run_hw_queue is run, your warning is triggered, but it is harmless 2) hctx->next_cpu can become offline from online before __blk_mq_run_hw_queue is run, there isn't warning, but once the IO is submitted to hardware, after it is completed, how does the HBA/hw queue notify CPU since CPUs assigned to this hw queue(irq vector) are offline? blk-mq's timeout handler may cover that, but looks too tricky. > > > > > Just be curious how you trigger this issue? And is it triggered in CPU > > hotplug stress test? Or in a normal use case? > > In fact, this is my own investigation about whether the .queue_rq to one hardware queue could be executed on > the cpu where it is not mapped. Finally, found this hole when cpu hotplug. > I did the test on NVMe device which has 1-to-1 mapping between cpu and hctx. > - A special patch that could hold some requests on ctx->rq_list though .get_budget > - A script issues IOs with fio > - A script online/offline the cpus continuously Thanks for sharing your reproduction approach. Without a handler for CPU hotplug, it isn't easy to avoid the warning completely in __blk_mq_run_hw_queue(). > At first, just the warning above. Then after this patch was introduced, panic came up. We have to fix the panic, so I will post the patch you tested in this thread. Thanks, Ming