On Wed, 2019-04-03 at 10:52 -0700, Bart Van Assche wrote: > On Wed, 2019-03-27 at 18:00 -0400, Laurence Oberman wrote: > > Hello Jens, Jianchao > > Finally made it to this one. > > I will see if I can revert and test > > > > 7f556a44e61d0b62d78db9a2662a5f0daef010f2 is the first bad commit > > commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2 > > Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > > Date: Fri Dec 14 09:28:18 2018 +0800 > > > > blk-mq: refactor the code of issue request directly > > > > Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly > > into one interface to unify the interfaces to issue requests > > directly. The merged interface takes over the requests totally, > > it could insert, end or do nothing based on the return value of > > .queue_rq and 'bypass' parameter. Then caller needn't any other > > handling any more and then code could be cleaned up. > > > > And also the commit c616cbee ( blk-mq: punt failed direct issue > > to dispatch list ) always inserts requests to hctx dispatch > > list > > whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this > > is > > overkill and will harm the merging. We just need to do that for > > the requests that has been through .queue_rq. This patch also > > could fix this. > > > > Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > Hi Laurence, > > I have not been able to reproduce this issue. But you may want to try > the following patch (applies on top of v5.1-rc3): > > > Subject: [PATCH] block: Fix blk_mq_try_issue_directly() > > If blk_mq_try_issue_directly() returns BLK_STS*_RESOURCE that means > that > the request has not been queued and that the caller should retry to > submit > the request. Both blk_mq_request_bypass_insert() and > blk_mq_sched_insert_request() guarantee that a request will be > processed. > Hence return BLK_STS_OK if one of these functions is called. This > patch > avoids that blk_mq_dispatch_rq_list() crashes when using dm-mpath. > > Reported-by: Laurence Oberman <loberman@xxxxxxxxxx> > Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request > directly") # v5.0. > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > block/blk-mq.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 652d0c6d5945..b2c20dce8a30 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1859,16 +1859,11 @@ blk_status_t blk_mq_try_issue_directly(struct > blk_mq_hw_ctx *hctx, > case BLK_STS_RESOURCE: > if (force) { > blk_mq_request_bypass_insert(rq, run_queue); > - /* > - * We have to return BLK_STS_OK for the DM > - * to avoid livelock. Otherwise, we return > - * the real result to indicate whether the > - * request is direct-issued successfully. > - */ > - ret = bypass ? BLK_STS_OK : ret; > + ret = BLK_STS_OK; > } else if (!bypass) { > blk_mq_sched_insert_request(rq, false, > run_queue, false); > + ret = BLK_STS_OK; > } > break; > default: > > > Thanks, > > Bart. Hello Bart For the above: Reviewed-by: Laurence Oberman <loberman@xxxxxxxxxx> Tested-by: Laurence Oberman <loberman@xxxxxxxxxx> Thank you. Given I know this issue very well, I can confirm your patch fixes it. Against 5.1-rc3 the initiator no longer panics when I reboot the ib_srpt target server. It continues to try reconnect as it should. I would never have found this. Patch makes sense now of course so I can review it. [ 221.285919] device-mapper: multipath: Failing path 8:176. [ 221.286182] device-mapper: multipath: Failing path 65:144. [ 221.286266] device-mapper: multipath: Failing path 65:0. [ 221.286625] device-mapper: multipath: Failing path 65:32. [ 221.286708] device-mapper: multipath: Failing path 65:96. [ 221.286965] device-mapper: multipath: Failing path 65:224. [ 221.287115] device-mapper: multipath: Failing path 66:48. [ 221.309589] sd 1:0:0:14: rejecting I/O to offline device [ 221.309595] sd 1:0:0:6: rejecting I/O to offline device [ 231.692106] scsi host2: ib_srp: Got failed path rec status -110 [ 231.722521] scsi host2: ib_srp: Path record query failed: sgid fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service_id 0x7cfe900300726e4e [ 231.816709] scsi host2: reconnect attempt 2 failed (-110) [ 236.684030] scsi host1: ib_srp: Got failed path rec status -110 [ 236.716132] scsi host1: ib_srp: Path record query failed: sgid fe80:0000:0000:0000:7cfe:9003:0072:6ed2, dgid fe80:0000:0000:0000:7cfe:9003:0072:6e4e, pkey 0xffff, service_id 0x7cfe900300726e4e [ 236.814095] scsi host1: reconnect attempt 2 failed (-110)