Thanks Omar. > -----Original Message----- > From: Omar Sandoval [mailto:osandov@xxxxxxxxxxx] > Sent: Tuesday, February 28, 2017 11:51 PM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx>; axboe@xxxxxx; linux- > block@xxxxxxxxxxxxxxx; Sagi Grimberg <sagi@xxxxxxxxxxx>; Christoph > Hellwig <hch@xxxxxx>; Max Gurtovoy <maxg@xxxxxxxxxxxx> > Subject: Re: connect cmd error for nvme-rdma with eventual kernel crash > > On Wed, Mar 01, 2017 at 04:55:23AM +0000, Parav Pandit wrote: > > Hi Jens, > > > > > -----Original Message----- > > > From: Jens Axboe [mailto:axboe@xxxxxxxxx] > > > Subject: Re: connect cmd error for nvme-rdma with eventual kernel > > > crash > > > > > > > On Feb 28, 2017, at 5:57 PM, Parav Pandit <parav@xxxxxxxxxxxx> > wrote: > > > > > > > > Hi Jens, > > > > > > > > With your commit 2af8cbe30531eca73c8f3ba277f155fc0020b01a in > > > > linux-block git tree, There are two requests tables. Static and > > > > dynamic of > > > same size. > > > > However function blk_mq_tag_to_rq() always try to get the tag from > > > > the > > > dynamic table which doesn't seem to be always initialized. > > > > > > > > I am running nvme-rdma initiator and it fails to find the request > > > > for the > > > given tag when command completes. > > > > Command triggers error recovery with "tag not found" error. > > > > Eventually kernel is crashing in blk_mq_queue_tag_busy_iter() with > > > > NULL > > > pointer. Seems to be additional bug in error recovery. > > > > > > > > To debug, I added initializing dynamic tags as well. > > > > > > > > blk_mq_alloc_rqs() { > > > > tags->static_rqs[i] = rq; > > > > + tags->rqs[i] = rq; > > > > > > > > This appears to resolve the issue. But that's not the fix. > > > > It appears to me that nvme stack is broken in certain conditions > > > > with recent > > > static and dynamic rq tables change. > > > > > > Can you try my for-linus branch? > > > > I tried for-linus branch and it works. > > > > Seems like ac6e0c2d633ab0411810fe6b15a40808309041db fixes it. > > __blk_mq_alloc_request() > > data->hctx->tags->rqs[rq->tag] = rq; > > > > Commit says no functional difference but it is actually fixing this issue. > > > > Parav > > The fix is actually this one: > > commit f867f4804d55adeef42c68c89edad49cdf3058f7 > Author: Omar Sandoval <osandov@xxxxxx> > Date: Mon Feb 27 10:28:27 2017 -0800 > > blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request > > blk_mq_alloc_request_hctx() allocates a driver request directly, unlike > its blk_mq_alloc_request() counterpart. It also crashes because it > doesn't update the tags->rqs map. > > Fix it by making it allocate a scheduler request. > > Reported-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > Signed-off-by: Jens Axboe <axboe@xxxxxx> > Tested-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > > The commit you pointed out would have also fixed it, but after this change > it's a no-op.