On Mon, Aug 17, 2020 at 12:15:39PM +0200, Christoph Hellwig wrote: > On Mon, Aug 17, 2020 at 06:01:15PM +0800, Ming Lei wrote: > > SCHED_RESTART code path is relied to re-run queue for dispatch requests > > in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding > > requests to hctx->dispatch. > > > > memory barriers have to be used for ordering the following two pair of OPs: > > > > 1) adding requests to hctx->dispatch and checking SCHED_RESTART in > > blk_mq_dispatch_rq_list() > > > > 2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch > > in blk_mq_sched_restart(). > > > > Without the added memory barrier, either: > > > > 1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime > > blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in > > dispatch side > > > > or > > > > 2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue > > in dispatch side, meantime checking if there is request in > > hctx->dispatch from blk_mq_sched_restart() is missed. > > > > IO hang in ltp/fs_fill test is reported by kernel test robot: > > > > https://lkml.org/lkml/2020/7/26/77 > > > > Turns out it is caused by the above out-of-order OPs. And the IO hang > > can't be observed any more after applying this patch. > > > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > Cc: Christoph Hellwig <hch@xxxxxx> > > Cc: David Jeffery <djeffery@xxxxxxxxxx> > > Reported-by: kernel test robot <rong.a.chen@xxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > Can you add a Fixes: tag so that the commit gets backported? Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Thanks, Ming