On 2020-02-03 12:59, Salman Qazi wrote: > We observed that it is possible for a flush to bypass the I/O > scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. > This can happen while a kworker is running blk_mq_do_dispatch_sched call > in blk_mq_sched_dispatch_requests. > > However, the blk_mq_do_dispatch_sched call doesn't end in bounded time. > As a result, the flush can sit there indefinitely, as the I/O scheduler > feeds an arbitrary number of requests to the hardware. > > The solution is to periodically poll hctx->dispatch in > blk_mq_do_dispatch_sched, to put a bound on the latency of the commands > sitting there. (added Christoph, Ming and Hannes to the Cc-list) Thank you for having posted a patch; that really helps. I see that my name occurs first in the "To:" list. Since Jens is the block layer maintainer I think Jens should have been mentioned first. In version v4.20 of the Linux kernel I found the following in the legacy block layer code: * From blk_insert_flush(): list_add_tail(&rq->queuelist, &q->queue_head); * From elv_next_request(): list_for_each_entry(rq, &q->queue_head, queuelist) I think this means that the legacy block layer sent flush requests to the scheduler instead of directly to the block driver. How about modifying the blk-mq code such that it mimics that approach? I'm asking this because this patch, although the code looks clean, doesn't seem the best solution to me. > + if (count > 0 && count % q->max_sched_batch == 0 && > + !list_empty_careful(&hctx->dispatch)) > + break; A modulo operation in the hot path? Please don't do that. > +static ssize_t queue_max_sched_batch_store(struct request_queue *q, > + const char *page, > + size_t count) > +{ > + int err, val; > + > + if (!q->mq_ops) > + return -EINVAL; > + > + err = kstrtoint(page, 10, &val); > + if (err < 0) > + return err; > + > + if (val <= 0) > + return -EINVAL; > + > + q->max_sched_batch = val; > + > + return count; > +} Has it been considered to use kstrtouint() instead of checking whether the value returned by kstrtoint() is positive? > + int max_sched_batch; unsigned int? Thanks, Bart.