Re: connect cmd error for nvme-rdma with eventual kernel crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 01, 2017 at 04:55:23AM +0000, Parav Pandit wrote:
> Hi Jens,
> 
> > -----Original Message-----
> > From: Jens Axboe [mailto:axboe@xxxxxxxxx]
> > Subject: Re: connect cmd error for nvme-rdma with eventual kernel crash
> > 
> > > On Feb 28, 2017, at 5:57 PM, Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> > >
> > > Hi Jens,
> > >
> > > With your commit 2af8cbe30531eca73c8f3ba277f155fc0020b01a in
> > > linux-block git tree, There are two requests tables. Static and dynamic of
> > same size.
> > > However function blk_mq_tag_to_rq() always try to get the tag from the
> > dynamic table which doesn't seem to be always initialized.
> > >
> > > I am running nvme-rdma initiator and it fails to find the request for the
> > given tag when command completes.
> > > Command triggers error recovery with "tag not found" error.
> > > Eventually kernel is crashing in blk_mq_queue_tag_busy_iter() with NULL
> > pointer. Seems to be additional bug in error recovery.
> > >
> > > To debug, I added initializing dynamic tags as well.
> > >
> > > blk_mq_alloc_rqs() {
> > >            tags->static_rqs[i] = rq;
> > > +            tags->rqs[i] = rq;
> > >
> > > This appears to resolve the issue. But that's not the fix.
> > > It appears to me that nvme stack is broken in certain conditions with recent
> > static and dynamic rq tables change.
> > 
> > Can you try my for-linus branch?
> 
> I tried for-linus branch and it works.
> 
> Seems like ac6e0c2d633ab0411810fe6b15a40808309041db fixes it.
> __blk_mq_alloc_request() 
> data->hctx->tags->rqs[rq->tag] = rq;
> 
> Commit says no functional difference but it is actually fixing this issue.
> 
> Parav

The fix is actually this one:

commit f867f4804d55adeef42c68c89edad49cdf3058f7
Author: Omar Sandoval <osandov@xxxxxx>
Date:   Mon Feb 27 10:28:27 2017 -0800

    blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request

    blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
    its blk_mq_alloc_request() counterpart. It also crashes because it
    doesn't update the tags->rqs map.

    Fix it by making it allocate a scheduler request.

    Reported-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
    Signed-off-by: Omar Sandoval <osandov@xxxxxx>
    Signed-off-by: Jens Axboe <axboe@xxxxxx>
    Tested-by: Sagi Grimberg <sagi@xxxxxxxxxxx>

The commit you pointed out would have also fixed it, but after this
change it's a no-op.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux