On Sat, May 09, 2020 at 07:18:46AM -0700, Bart Van Assche wrote: > On 2020-05-08 21:10, Ming Lei wrote: > > queue freezing can only be applied on the request queue level, and not > > hctx level. When requests can't be completed, wait freezing just hangs > > for-ever. > > That's indeed what I meant: freeze the entire queue instead of > introducing a new mechanism that freezes only one hardware queue at a time. No, the issue is exactly that one single hctx becomes inactive, and other hctx are still active and workable. If one entire queue is frozen because of some of CPUs are offline, how can userspace submit IO to this disk? You suggestion justs makes the disk not usable, that won't be accepted. > > Please clarify what "when requests can't be completed" means. Are you > referring to requests that take longer than expected due to e.g. a > controller lockup or to requests that take a long time intentionally? If all CPUs in one hctx->cpumask are offline, the managed irq of this hw queue will be shutdown by genirq code, so any in-flight IO won't be completed or timedout after the managed irq is shutdown because of cpu offline. Some drivers may implement timeout handler, so these in-flight requests will be timed out, but still not friendly behaviour given the default timeout is too long. Some drivers don't implement timeout handler at all, so these IO won't be completed. > The former case is handled by the block layer timeout handler. I propose > to handle the latter case by introducing a new callback function pointer > in struct blk_mq_ops that aborts all outstanding requests. As I mentioned, timeout isn't a friendly behavior. Or not every driver implements timeout handler or well enough. > Request queue > freezing is such an important block layer mechanism that I think we > should require that all block drivers support freezing a request queue > in a short time. Firstly, we just need to drain in-flight requests and re-submit queued requests from one single hctx, and queue wide freezing causes whole userspace IOs blocked unnecessarily. Secondly, some requests may not be completed at all, so freezing can't work because freeze_wait may hang forever. Thanks, Ming