Ming Lei <ming.lei@xxxxxxxxxx> writes: > On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote: >> When the CPU is finally offlined, i.e. the CPU cleared the online bit in >> the online mask is definitely too late simply because it still runs on >> that outgoing CPU _after_ the hardware queue is shut down and drained. > > IMO, the patch in Christoph's blk-mq-hotplug.2 still works for percpu > kthread. > > It is just not optimal in the retrying, but it should be fine. When the > percpu kthread is scheduled on the CPU to be offlined: > > - if the kthread doesn't observe the INACTIVE flag, the allocated request > will be drained. > > - otherwise, the kthread just retries and retries to allocate & release, > and sooner or later, its time slice is consumed, and migrated out, and the > cpu hotplug handler will get chance to run and move on, then the cpu is > shutdown. 1) This is based on the assumption that the kthread is in the SCHED_OTHER scheduling class. Is that really a valid assumption? 2) What happens in the following scenario: unplug mq_offline set_ctx_inactive() drain_io() io_kthread() try_queue() wait_on_ctx() Can this happen and if so what will wake up that thread? I'm not familiar enough with that code to answer #2, but this really wants to be properly described and documented. Thanks, tglx