On Wed, Nov 08, 2017 at 04:41:35PM +0000, Bart Van Assche wrote: > On Tue, 2017-11-07 at 20:06 -0700, Jens Axboe wrote: > > At this point, I have no idea what Bart's setup looks like. Bart, it > > would be REALLY helpful if you could tell us how you are reproducing > > your hang. I don't know why this has to be dragged out. > > Hello Jens, > > It is a disappointment to me that you have allowed Ming to evaluate other > approaches than reverting "blk-mq: don't handle TAG_SHARED in restart". That I have mentioned in another email to Jens, that I agree to revert that patch because of TAG_WAITING's issue in Jens's test case. > patch namely replaces an algorithm that is trusted by the community with an > algorithm of which even Ming acknowledged that it is racy. A quote from [1]: > "IO hang may be caused if all requests are completed just before the current > SCSI device is added to shost->starved_list". I don't know of any way to fix > that race other than serializing request submission and completion by adding > locking around these actions, which is something we don't want. Hence my > request to revert that patch. That can't be the reason for this revert. This issue[1] is fixed by '[PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()', follows the idea: - we add sdev into shost->starved_list in scsi_target_queue_ready(), and the return value of BLK_STS_RESOURCE is set - atomic_read(&sdev->device_busy) is checked to see if there is pending request, queue will be run if it is zero, otherwise we depend on scsi_end_request() from pending request to restart queue. - you may mention sdev->device_busy may become 0 just after the check, then the completion still see the sdev in shost->starved_list and do the restart, and no IO hang If you think something above is wrong, please comment on it directly. Without this patch, no need any out-of-tree patch, IO hang can be triggered in test 01 of srp-test. After this patch is applied on V4.14-rc4, no IO hang can be observed any more. > > Regarding the test I run, here is a summary of what I mentioned in previous > e-mails: > * I modified the SRP initiator such that the SCSI target queue depth is > reduced to one by setting starget->can_queue to 1 from inside > scsi_host_template.target_alloc. > * With that modified SRP initiator I run the srp-test software as follows > until something breaks: > while ./run_tests -f xfs -d -e deadline -r 60; do :; done > > Today a system with at least one InfiniBand HCA is required to run that test. > When I have the time I will post the SRP initiator and target patches on the > linux-rdma mailing list that make it possible to run that test against the > SoftRoCE driver (drivers/infiniband/sw/rxe). The only hardware required to > use that driver is an Ethernet adapter. The thing is that we still don't know the root cause for your issue, and keeping the restart for TAG_SHARED can be thought as a workaround. Maybe it is same with Jens, maybe others, we don't know, and even without any log provided, such as sched_tags or tags. It is easy to see > 20% IOPS drops with restart for TAG_SHARED in 8 luns scsi debug test. -- Ming