On 11/08/2017 11:22 AM, Laurence Oberman wrote: > On Wed, 2017-11-08 at 10:57 -0700, Jens Axboe wrote: >> On 11/08/2017 09:41 AM, Bart Van Assche wrote: >>> On Tue, 2017-11-07 at 20:06 -0700, Jens Axboe wrote: >>>> At this point, I have no idea what Bart's setup looks like. Bart, >>>> it >>>> would be REALLY helpful if you could tell us how you are >>>> reproducing >>>> your hang. I don't know why this has to be dragged out. >>> >>> Hello Jens, >>> >>> It is a disappointment to me that you have allowed Ming to evaluate >>> other >>> approaches than reverting "blk-mq: don't handle TAG_SHARED in >>> restart". That >>> patch namely replaces an algorithm that is trusted by the community >>> with an >>> algorithm of which even Ming acknowledged that it is racy. A quote >>> from [1]: >>> "IO hang may be caused if all requests are completed just before >>> the current >>> SCSI device is added to shost->starved_list". I don't know of any >>> way to fix >>> that race other than serializing request submission and completion >>> by adding >>> locking around these actions, which is something we don't want. >>> Hence my >>> request to revert that patch. >> >> I was reluctant to revert it, in case we could work out a better way >> of >> doing it. As I mentioned in the other replies, it's not exactly the >> prettiest or most efficient. However, since we currently don't have >> a good solution for the issue, I'm fine with reverting that patch. >> >>> Regarding the test I run, here is a summary of what I mentioned in >>> previous >>> e-mails: >>> * I modified the SRP initiator such that the SCSI target queue >>> depth is >>> reduced to one by setting starget->can_queue to 1 from inside >>> scsi_host_template.target_alloc. >>> * With that modified SRP initiator I run the srp-test software as >>> follows >>> until something breaks: >>> while ./run_tests -f xfs -d -e deadline -r 60; do :; done >> >> What kernel options are needed? Where do I download everything I >> need? >> >> In other words, would it be possible to do a fuller guide for getting >> this setup and running? >> >> I'll run my simple test case as well, since it's currently breaking >> basically everywhere. >> >>> Today a system with at least one InfiniBand HCA is required to run >>> that test. >>> When I have the time I will post the SRP initiator and target >>> patches on the >>> linux-rdma mailing list that make it possible to run that test >>> against the >>> SoftRoCE driver (drivers/infiniband/sw/rxe). The only hardware >>> required to >>> use that driver is an Ethernet adapter. >> >> OK, I guess I can't run it then... I'll have to rely on your testing. > > Hello > > I agree with Bart in this case, we should revert this. > My test-bed is tied up and I have not been able to give it back to Ming > so he could follow up on Bart's last update. > > Right now its safer to revert. I had already reverted it when sending out that email, so we should be all set (hopefully). -- Jens Axboe