Re: [resend] Crash in nvmet_req_init() - null req->rsp pointer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
> 
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.  Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.  It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>>    493   bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>>    494                   struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>>    495   {
>>    496           u8 flags = req->cmd->common.flags;
>>    497           u16 status;
>>    498
>>    499           req->cq = cq;
>>    500           req->sq = sq;
>>    501           req->ops = ops;
>>    502           req->sg = NULL;
>>    503           req->sg_cnt = 0;
>>    504           req->rsp->status = 0; <-- HERE
>>
>> The  nvme command opcode is nvme_cmd_write.  The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.  The nvmet_req looks valid [2].  IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?  I didn't see anything like this in recent nvmf fixes...
> 
> Is it possible that you ran out of rsps and got a corrupted rsp?

That is what it looks like.  Because the nvmet_rdma_rsp pointer returned
from nvmet_rdma_get_rsp() was not within the bounds of the response
struct array allocated for that queue.  I wasn't sure how that could
happen?  I would think first_entry() would return NULL if the list was
empty, but I guess not.


> 
> How about trying out this patch to add more information:

I'll try this out.  Thanks Sagi!

> -- 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index e7f43d1e1779..890d9c45ca33 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {
> 
>         struct nvmet_req        req;
> 
> +       bool                    allocated;
>         u8                      n_rdma;
>         u32                     flags;
>         u32                     invalidate_rkey;
> @@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
>         unsigned long flags;
> 
>         spin_lock_irqsave(&queue->rsps_lock, flags);
> -       rsp = list_first_entry(&queue->free_rsps,
> +       rsp = list_first_entry_or_null(&queue->free_rsps,
>                                 struct nvmet_rdma_rsp, free_list);
> -       list_del(&rsp->free_list);
> +       if (rsp) {
> +               list_del(&rsp->free_list);
> +               rsp->allocated = false;
> +       }
>         spin_unlock_irqrestore(&queue->rsps_lock, flags);
> 
> +       if (!rsp) {
> +               pr_debug("dynamically allocated rsp\n");
> +               rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
> +               rsp->allocated = true;
> +       }
> +
>         return rsp;
>  }
> 
> @@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
>  {
>         unsigned long flags;
> 
> +       if (rsp->allocated) {
> +               kfree(rsp);
> +               return;
> +       }
> +
>         spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
>         list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
>         spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
> -- 
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux