On Tue, Nov 07, 2017 at 10:34:35PM +0000, Bart Van Assche wrote: > On Tue, 2017-11-07 at 15:06 -0700, Jens Axboe wrote: > > Just to keep everyone in the loop, this bug is not new to > > for-4.15/block, nor is it new to the current 4.41-rc or 4.13. So it's > > probably different to what Bart is hitting, but it's a bug none the > > less... > > Hello Jens, > > There are several reasons why I think that patch "blk-mq: don't handle > TAG_SHARED in restart" really should be reverted: > * That patch is based on the assumption that only the SCSI driver uses shared > tags. That assumption is not correct. null_blk and nvme also use shared tags. No, both null_blk and nvme should be handled by BLK_MQ_S_TAG_WAITING, not need to waste CPU to check all shared tags. > * As my tests have shown, the algorithm for restarting queues based on the Your test doesn't show it is related with RESTART since there isn't pending request in output of 'tags'. > SCSI starved list is flawed. So using that mechanism instead of the blk-mq > shared queue restarting algorithm is wrong. The algorithm based on starved list has been used for dozens of years for SCSI, I don't think it is flawed enough. > * We are close to the merge window. It is too late for trying to fix the > "blk-mq: don't handle TAG_SHARED in restart" patch. If you can provide us the reproduction approach, the time is enough to fix it before V4.15 release. > > My proposal is to make sure that what will be sent to Linus during the v4.15 > merge window works reliably. That means using the v4.13/v4.14 algorithm for > queue restarting which is an algorithm that is trusted by the community. If > Roman Penyaev's patch could get applied that would be even better. Frankly speaking, the algorithm for blk-mq's restarting won't be used by SCSI at all because scsi_end_request() restarts the queue before the restart for TAG_SHARED. For NVMe and null_blk, it is basically same since we cover that via BLK_MQ_S_TAG_WAITING. So Nak your proposal. -- Ming