Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 11 May 2020 11:48:27 +0800

On Sun, May 10, 2020 at 08:20:24PM -0700, Bart Van Assche wrote:
> On 2020-05-10 18:45, Ming Lei wrote:
> > On Sat, May 09, 2020 at 07:18:46AM -0700, Bart Van Assche wrote:
> >> On 2020-05-08 21:10, Ming Lei wrote:
> >>> queue freezing can only be applied on the request queue level, and not
> >>> hctx level. When requests can't be completed, wait freezing just hangs
> >>> for-ever.
> >>
> >> That's indeed what I meant: freeze the entire queue instead of
> >> introducing a new mechanism that freezes only one hardware queue at a time.
> > 
> > No, the issue is exactly that one single hctx becomes inactive, and
> > other hctx are still active and workable.
> > 
> > If one entire queue is frozen because of some of CPUs are offline, how
> > can userspace submit IO to this disk? You suggestion justs makes the
> > disk not usable, that won't be accepted.
> 
> What I meant is to freeze a request queue temporarily (until hot
> unplugging of a CPU has finished). I would never suggest to freeze a
> request queue forever and I think that you already knew that.

But what is your motivation to freeze queue temporarily?

I don's see any help of freezing queue for this issue. Also even though
it is temporary, IO effect still can be observed for other online CPUs.

If you want to block new allocation from the inactive hctx, that isn't
necessary cause no new allocation is basically possible because all
cpus of this hctx will be offline.

If you want to wait completion of in-flight requests, that isn't doable
because requests may not be completed at all when one hctx becomes
inactive and the managed interrupt is shutdown.

> 
> >> Please clarify what "when requests can't be completed" means. Are you
> >> referring to requests that take longer than expected due to e.g. a
> >> controller lockup or to requests that take a long time intentionally?
> > 
> > If all CPUs in one hctx->cpumask are offline, the managed irq of this hw
> > queue will be shutdown by genirq code, so any in-flight IO won't be
> > completed or timedout after the managed irq is shutdown because of cpu
> > offline.
> > 
> > Some drivers may implement timeout handler, so these in-flight requests
> > will be timed out, but still not friendly behaviour given the default
> > timeout is too long.
> > 
> > Some drivers don't implement timeout handler at all, so these IO won't
> > be completed.
> 
> I think that the block layer needs to be notified after the decision has

I have added new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for getting the
notification and blk_mq_hctx_notify_online() will be called before this
cpu is put offline.

> been taken to offline a CPU and before the interrupts associated with
> that CPU are disabled. That would allow the block layer to freeze a
> request queue without triggering any timeouts (ignoring block driver and
> hardware bugs). I'm not familiar with CPU hotplugging so I don't know
> whether or not such a mechanism already exists.

How can freezing queue avoid to triggering timeout?

Freezing queue basically blocks new request allocation, and follows wait
for completion of all in-flight request. As I explained, either no new
allocation on this inactive hctx, or in-flight request won't be completed
without this patch's solution.

> 
> >> The former case is handled by the block layer timeout handler. I propose
> >> to handle the latter case by introducing a new callback function pointer
> >> in struct blk_mq_ops that aborts all outstanding requests.
> > 
> > As I mentioned, timeout isn't a friendly behavior. Or not every driver
> > implements timeout handler or well enough.
> 
> What I propose is to fix those block drivers instead of complicating the
> block layer core further and instead of introducing potential deadlocks
> in the block layer core.

The deadlock you mentioned can be fixed with help of BLK_MQ_REQ_PREEMPT.

> 
> >> Request queue
> >> freezing is such an important block layer mechanism that I think we
> >> should require that all block drivers support freezing a request queue
> >> in a short time.
> > 
> > Firstly, we just need to drain in-flight requests and re-submit queued
> > requests from one single hctx, and queue wide freezing causes whole
> > userspace IOs blocked unnecessarily.
> 
> Freezing a request queue for a short time is acceptable. As you know we
> already do that when the queue depth is modified, when the write-back
> throttling latency is modified and also when the I/O scheduler is changed.

Again, how can freeze queue help the issue addressed by this patchset?

> 
> > Secondly, some requests may not be completed at all, so freezing can't
> > work because freeze_wait may hang forever.
> 
> If a request neither can be aborted nor completes then that's a severe
> bug in the block driver that submitted the request to the block device.

It is hard to implement timeout handler for every driver, or remove all
BLK_EH_RESET_TIMER returning from driver.

Even for drivers which implementing timeout handler elegantly, it isn't
friendly to wait several dozens of seconds or more than one hundred seconds
to wait IO completion during cpu hotplug. Who said that IO timeout has
to be triggered during cpu hotplug? At least there isn't such issue with
non-managed interrupt.

Thanks, 
Ming