On 1/25/19 4:49 AM, jianchao.wang wrote:
It sounds like not so easy to trigger.
blk_mq_dispatch_rq_list
scsi_queue_rq
if (atomic_read(&sdev->device_busy) ||
scsi_device_blocked(sdev))
ret = BLK_STS_DEV_RESOURCE; scsi_end_request
__blk_mq_end_request
blk_mq_sched_restart // clear RESTART
blk_mq_run_hw_queue
blk_mq_run_hw_queues
list_splice_init(list, &hctx->dispatch)
needs_restart = blk_mq_sched_needs_restart(hctx)
The 'needs_restart' will be false, so the queue would be rerun.
Thanks
Jianchao
Good point. So the RESTART flag is supposed to protect against this? Now
I see, this is also sort of what the lengthy comment in
blk_mq_dispatch_rq_list is saying.
May I complain that this is very unintuitive (the queue gets rerun when
the RESTART flag is _not_ set) and also unreliable, as not every caller
of blk_mq_dispatch_rq_list seems to set the flag, and also it does not
always get cleared in __blk_mq_end_request?
__blk_mq_end_request does the following:
if (rq->end_io) {
rq_qos_done(rq->q, rq);
rq->end_io(rq, error);
} else {
if (unlikely(blk_bidi_rq(rq)))
blk_mq_free_request(rq->next_rq);
blk_mq_free_request(rq);
}
and blk_mq_free_request then calls blk_mq_sched_restart, which clears
the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is
never called.
On 1/25/19 5:05 AM, Bart Van Assche wrote:
>
> Can you have a look at
> https://bugzilla.kernel.org/show_bug.cgi?id=202353 and see whether that
> issue is related to what you encountered?
>
> Thanks,
>
> Bart.
I don't know. My hangs are only up to 30 sec (but that's because BTRFS
does a transaction every 30s, I don't know what would happen with ext4),
and for me only one process blocks, everything else still works
flawlessly. Especially programs which do not fsync are not affected at
all. If I find some time, I can also try downgrading my kernel to 4.18
and see if the problem persists.