On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote: > As Thomas clarified, workqueue hasn't such issue any more, and only other > per CPU kthreads can run until the CPU clears the online bit. > > So the question is if IO can be submitted from such kernel context? What other per-CPU kthreads even exist? > > INACTIVE is set to the hctx, and it is set by the last CPU to be > > offlined that is mapped to the hctx. once the bit is set the barrier > > ensured it is seen everywhere before we start waiting for the requests > > to finish. What is missing?: > > memory barrier should always be used as pair, and you should have mentioned > that the implied barrier in test_and_set_bit_lock pair from sbitmap_get() > is pair of smp_mb__after_atomic() in blk_mq_hctx_notify_offline(). Documentation/core-api/atomic_ops.rst makes it pretty clear that the special smp_mb__before_atomic and smp_mb__after_atomic barriers are only used around the set_bit/clear_bit/change_bit operations, and not on the test_bit side. That is also how they are used in all the callsites I checked. > Then setting tag bit and checking INACTIVE in blk_mq_get_tag() can be ordered, > same with setting INACTIVE and checking tag bit in blk_mq_hctx_notify_offline(). Buy yes, even if not that would take care of it. > BTW, smp_mb__before_atomic() in blk_mq_hctx_notify_offline() isn't needed. True.