RE: connect cmd error for nvme-rdma with eventual kernel crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Omar.

> -----Original Message-----
> From: Omar Sandoval [mailto:osandov@xxxxxxxxxxx]
> Sent: Tuesday, February 28, 2017 11:51 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>; axboe@xxxxxx; linux-
> block@xxxxxxxxxxxxxxx; Sagi Grimberg <sagi@xxxxxxxxxxx>; Christoph
> Hellwig <hch@xxxxxx>; Max Gurtovoy <maxg@xxxxxxxxxxxx>
> Subject: Re: connect cmd error for nvme-rdma with eventual kernel crash
> 
> On Wed, Mar 01, 2017 at 04:55:23AM +0000, Parav Pandit wrote:
> > Hi Jens,
> >
> > > -----Original Message-----
> > > From: Jens Axboe [mailto:axboe@xxxxxxxxx]
> > > Subject: Re: connect cmd error for nvme-rdma with eventual kernel
> > > crash
> > >
> > > > On Feb 28, 2017, at 5:57 PM, Parav Pandit <parav@xxxxxxxxxxxx>
> wrote:
> > > >
> > > > Hi Jens,
> > > >
> > > > With your commit 2af8cbe30531eca73c8f3ba277f155fc0020b01a in
> > > > linux-block git tree, There are two requests tables. Static and
> > > > dynamic of
> > > same size.
> > > > However function blk_mq_tag_to_rq() always try to get the tag from
> > > > the
> > > dynamic table which doesn't seem to be always initialized.
> > > >
> > > > I am running nvme-rdma initiator and it fails to find the request
> > > > for the
> > > given tag when command completes.
> > > > Command triggers error recovery with "tag not found" error.
> > > > Eventually kernel is crashing in blk_mq_queue_tag_busy_iter() with
> > > > NULL
> > > pointer. Seems to be additional bug in error recovery.
> > > >
> > > > To debug, I added initializing dynamic tags as well.
> > > >
> > > > blk_mq_alloc_rqs() {
> > > >            tags->static_rqs[i] = rq;
> > > > +            tags->rqs[i] = rq;
> > > >
> > > > This appears to resolve the issue. But that's not the fix.
> > > > It appears to me that nvme stack is broken in certain conditions
> > > > with recent
> > > static and dynamic rq tables change.
> > >
> > > Can you try my for-linus branch?
> >
> > I tried for-linus branch and it works.
> >
> > Seems like ac6e0c2d633ab0411810fe6b15a40808309041db fixes it.
> > __blk_mq_alloc_request()
> > data->hctx->tags->rqs[rq->tag] = rq;
> >
> > Commit says no functional difference but it is actually fixing this issue.
> >
> > Parav
> 
> The fix is actually this one:
> 
> commit f867f4804d55adeef42c68c89edad49cdf3058f7
> Author: Omar Sandoval <osandov@xxxxxx>
> Date:   Mon Feb 27 10:28:27 2017 -0800
> 
>     blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
> 
>     blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
>     its blk_mq_alloc_request() counterpart. It also crashes because it
>     doesn't update the tags->rqs map.
> 
>     Fix it by making it allocate a scheduler request.
> 
>     Reported-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
>     Signed-off-by: Omar Sandoval <osandov@xxxxxx>
>     Signed-off-by: Jens Axboe <axboe@xxxxxx>
>     Tested-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
> 
> The commit you pointed out would have also fixed it, but after this change
> it's a no-op.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux