Re: fsync hangs after scsi rejected a request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/25/19 4:49 AM, jianchao.wang wrote:

It sounds like not so easy to trigger.

blk_mq_dispatch_rq_list
   scsi_queue_rq
      if (atomic_read(&sdev->device_busy) ||
        scsi_device_blocked(sdev))
        ret = BLK_STS_DEV_RESOURCE;             scsi_end_request
                                                  __blk_mq_end_request
                                                    blk_mq_sched_restart // clear RESTART
                                                      blk_mq_run_hw_queue
                                                  blk_mq_run_hw_queues
list_splice_init(list, &hctx->dispatch)
   needs_restart = blk_mq_sched_needs_restart(hctx)

The 'needs_restart' will be false, so the queue would be rerun.

Thanks
Jianchao

Good point. So the RESTART flag is supposed to protect against this? Now I see, this is also sort of what the lengthy comment in blk_mq_dispatch_rq_list is saying.

May I complain that this is very unintuitive (the queue gets rerun when the RESTART flag is _not_ set) and also unreliable, as not every caller of blk_mq_dispatch_rq_list seems to set the flag, and also it does not always get cleared in __blk_mq_end_request?

__blk_mq_end_request does the following:

	if (rq->end_io) {
		rq_qos_done(rq->q, rq);
		rq->end_io(rq, error);
	} else {
		if (unlikely(blk_bidi_rq(rq)))
			blk_mq_free_request(rq->next_rq);
		blk_mq_free_request(rq);
	}

and blk_mq_free_request then calls blk_mq_sched_restart, which clears the flag. But in my case, rq->end_io != 0, so blk_mq_free_request is never called.

On 1/25/19 5:05 AM, Bart Van Assche wrote:
>
> Can you have a look at
> https://bugzilla.kernel.org/show_bug.cgi?id=202353 and see whether that
> issue is related to what you encountered?
>
> Thanks,
>
> Bart.

I don't know. My hangs are only up to 30 sec (but that's because BTRFS does a transaction every 30s, I don't know what would happen with ext4), and for me only one process blocks, everything else still works flawlessly. Especially programs which do not fsync are not affected at all. If I find some time, I can also try downgrading my kernel to 4.18 and see if the problem persists.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux