Re: [PATCH] blk-mq: Fix several SCSI request queue lockups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> Commit 0df21c86bdbf introduced several bugs:
> * A SCSI queue stall for queue depths > 1, addressed by commit
>   88022d7201e9 ("blk-mq: don't handle failure in .get_budget")

This one is committed already.

> * A systematic lockup for SCSI queues with queue depth 1. The
>   following test reproduces that bug systematically:
>   - Change the SRP initiator such that SCSI target queue depth is
>     limited to 1.
>   - Run the following command:
>       srp-test/run_tests -f xfs -d -e none -r 60 -t 01
>   See also "[PATCH 4/7] blk-mq: Avoid that request processing
>   stalls when sharing tags"
>   (https://marc.info/?l=linux-block&m=151208695316857). Note:
>   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
>   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
>   before all blk_mq_dispatch_rq_list() calls only fixes the
>   systematic lockup for queue depth 1.

You are the only reproducer, and you don't want to provide any kernel
log about this issue, so how can we help you fix your issue?

You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
improve dispatching from sw queue")', but you don't mention any issue
about that commit, and your patch is actually nothing to do with
commit b347689ffbca, and seems your work style is just try and guess.

Also both Jens and I have run tests on null_blk and scsi_debug by setting
queue_depth as one, and we all can't see IO hang with current blk-mq.

> * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if
>   device is blocked in scsi_dev_queue_ready()"
>   (https://marc.info/?l=linux-block&m=151223233407154).

This issue is clearly explained in theory, and can be reproduced/verified
by scsi_debug, so why can't we apply it to fix the issue? And the fix is
simply and can be thought as cleanup too, since the handling for this case
becomes same with non-mq path now.

> 
> I think the above means that it is too risky to try to fix all bugs
> introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> Hence revert that commit.

What is the risk?

> 
> Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxx>
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxxx>
> Cc: Johannes Thumshirn <jthumshirn@xxxxxxx>
> Cc: James E.J. Bottomley <jejb@xxxxxxxxxxxxxxxxxx>
> Cc: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
> Cc: linux-scsi@xxxxxxxxxxxxxxx

This commit fixes one important SCSI_MQ performance issue, we can't
simply revert it just because of one un-confirmed report from you
only(without any kernel log provided).

So Nak.

-- 
Ming



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux