On 2020-05-10 18:45, Ming Lei wrote: > On Sat, May 09, 2020 at 07:18:46AM -0700, Bart Van Assche wrote: >> On 2020-05-08 21:10, Ming Lei wrote: >>> queue freezing can only be applied on the request queue level, and not >>> hctx level. When requests can't be completed, wait freezing just hangs >>> for-ever. >> >> That's indeed what I meant: freeze the entire queue instead of >> introducing a new mechanism that freezes only one hardware queue at a time. > > No, the issue is exactly that one single hctx becomes inactive, and > other hctx are still active and workable. > > If one entire queue is frozen because of some of CPUs are offline, how > can userspace submit IO to this disk? You suggestion justs makes the > disk not usable, that won't be accepted. What I meant is to freeze a request queue temporarily (until hot unplugging of a CPU has finished). I would never suggest to freeze a request queue forever and I think that you already knew that. >> Please clarify what "when requests can't be completed" means. Are you >> referring to requests that take longer than expected due to e.g. a >> controller lockup or to requests that take a long time intentionally? > > If all CPUs in one hctx->cpumask are offline, the managed irq of this hw > queue will be shutdown by genirq code, so any in-flight IO won't be > completed or timedout after the managed irq is shutdown because of cpu > offline. > > Some drivers may implement timeout handler, so these in-flight requests > will be timed out, but still not friendly behaviour given the default > timeout is too long. > > Some drivers don't implement timeout handler at all, so these IO won't > be completed. I think that the block layer needs to be notified after the decision has been taken to offline a CPU and before the interrupts associated with that CPU are disabled. That would allow the block layer to freeze a request queue without triggering any timeouts (ignoring block driver and hardware bugs). I'm not familiar with CPU hotplugging so I don't know whether or not such a mechanism already exists. >> The former case is handled by the block layer timeout handler. I propose >> to handle the latter case by introducing a new callback function pointer >> in struct blk_mq_ops that aborts all outstanding requests. > > As I mentioned, timeout isn't a friendly behavior. Or not every driver > implements timeout handler or well enough. What I propose is to fix those block drivers instead of complicating the block layer core further and instead of introducing potential deadlocks in the block layer core. >> Request queue >> freezing is such an important block layer mechanism that I think we >> should require that all block drivers support freezing a request queue >> in a short time. > > Firstly, we just need to drain in-flight requests and re-submit queued > requests from one single hctx, and queue wide freezing causes whole > userspace IOs blocked unnecessarily. Freezing a request queue for a short time is acceptable. As you know we already do that when the queue depth is modified, when the write-back throttling latency is modified and also when the I/O scheduler is changed. > Secondly, some requests may not be completed at all, so freezing can't > work because freeze_wait may hang forever. If a request neither can be aborted nor completes then that's a severe bug in the block driver that submitted the request to the block device. Bart.