Re: Panic when rebooting target server testing srp on 5.0.0-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2019-04-03 at 10:52 -0700, Bart Van Assche wrote:
> On Wed, 2019-03-27 at 18:00 -0400, Laurence Oberman wrote:
> > Hello Jens, Jianchao
> > Finally made it to this one.
> > I will see if I can revert and test
> > 
> > 7f556a44e61d0b62d78db9a2662a5f0daef010f2 is the first bad commit
> > commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2
> > Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
> > Date:   Fri Dec 14 09:28:18 2018 +0800
> > 
> >     blk-mq: refactor the code of issue request directly
> >     
> >     Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly
> >     into one interface to unify the interfaces to issue requests
> >     directly. The merged interface takes over the requests totally,
> >     it could insert, end or do nothing based on the return value of
> >     .queue_rq and 'bypass' parameter. Then caller needn't any other
> >     handling any more and then code could be cleaned up.
> >     
> >     And also the commit c616cbee ( blk-mq: punt failed direct issue
> >     to dispatch list ) always inserts requests to hctx dispatch
> > list
> >     whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this
> > is
> >     overkill and will harm the merging. We just need to do that for
> >     the requests that has been through .queue_rq. This patch also
> >     could fix this.
> >     
> >     Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
> >     Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> 
> Hi Laurence,
> 
> I have not been able to reproduce this issue. But you may want to try
> the following patch (applies on top of v5.1-rc3):
> 
> 
> Subject: [PATCH] block: Fix blk_mq_try_issue_directly()
> 
> If blk_mq_try_issue_directly() returns BLK_STS*_RESOURCE that means
> that
> the request has not been queued and that the caller should retry to
> submit
> the request. Both blk_mq_request_bypass_insert() and
> blk_mq_sched_insert_request() guarantee that a request will be
> processed.
> Hence return BLK_STS_OK if one of these functions is called. This
> patch
> avoids that blk_mq_dispatch_rq_list() crashes when using dm-mpath.
> 
> Reported-by: Laurence Oberman <loberman@xxxxxxxxxx>
> Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request
> directly") # v5.0.
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
>  block/blk-mq.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 652d0c6d5945..b2c20dce8a30 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1859,16 +1859,11 @@ blk_status_t blk_mq_try_issue_directly(struct
> blk_mq_hw_ctx *hctx,
>  	case BLK_STS_RESOURCE:
>  		if (force) {
>  			blk_mq_request_bypass_insert(rq, run_queue);
> -			/*
> -			 * We have to return BLK_STS_OK for the DM
> -			 * to avoid livelock. Otherwise, we return
> -			 * the real result to indicate whether the
> -			 * request is direct-issued successfully.
> -			 */
> -			ret = bypass ? BLK_STS_OK : ret;
> +			ret = BLK_STS_OK;
>  		} else if (!bypass) {
>  			blk_mq_sched_insert_request(rq, false,
>  						    run_queue, false);
> +			ret = BLK_STS_OK;
>  		}
>  		break;
>  	default:
> 
> 
> Thanks,
> 
> Bart.

Hello Bart

For the above:

Reviewed-by: Laurence Oberman <loberman@xxxxxxxxxx>
Tested-by: Laurence Oberman <loberman@xxxxxxxxxx>


Thank you. Given I know this issue very well, I can confirm your patch
fixes it.
Against 5.1-rc3 the initiator no longer panics when I reboot the
ib_srpt target server. It continues to try reconnect as it should.
I would never have found this. Patch makes sense now of course so I can
review it.

[  221.285919] device-mapper: multipath: Failing path 8:176.
[  221.286182] device-mapper: multipath: Failing path 65:144.
[  221.286266] device-mapper: multipath: Failing path 65:0.
[  221.286625] device-mapper: multipath: Failing path 65:32.
[  221.286708] device-mapper: multipath: Failing path 65:96.
[  221.286965] device-mapper: multipath: Failing path 65:224.
[  221.287115] device-mapper: multipath: Failing path 66:48.
[  221.309589] sd 1:0:0:14: rejecting I/O to offline device
[  221.309595] sd 1:0:0:6: rejecting I/O to offline device
[  231.692106] scsi host2: ib_srp: Got failed path rec status -110
[  231.722521] scsi host2: ib_srp: Path record query failed: sgid
fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid
fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service_id
0x7cfe900300726e4e
[  231.816709] scsi host2: reconnect attempt 2 failed (-110)
[  236.684030] scsi host1: ib_srp: Got failed path rec status -110
[  236.716132] scsi host1: ib_srp: Path record query failed: sgid
fe80:0000:0000:0000:7cfe:9003:0072:6ed2, dgid
fe80:0000:0000:0000:7cfe:9003:0072:6e4e, pkey 0xffff, service_id
0x7cfe900300726e4e
[  236.814095] scsi host1: reconnect attempt 2 failed (-110)







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux