Re: FastLinQ: possible duplicate flush of FastReg and LocalInv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----"Chuck Lever III" <chuck.lever@xxxxxxxxxx> wrote: -----

>To: "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx>
>From: "Chuck Lever III" <chuck.lever@xxxxxxxxxx>
>Date: 03/16/2021 08:59PM
>Subject: [EXTERNAL] FastLinQ: possible duplicate flush of FastReg and
>LocalInv
>
>Hi-
>
>I've been trying to track down some crashes when running NFS/RDMA
>tests over FastLinQ devices in iWARP mode. To make it stressful,
>I've enabled disconnect injection, where rpcrdma injects a
>connection disconnect every so often.
>
>As part of a disconnect event, the Receive and Send queues are
>drained. Sometimes I see a duplicate flush for one or more of
>memory registration ops. This is not a big deal for FastReq
>because its completion handler is basically a no-op.
>
>But for LocalInv this is a problem. On a flushed completion, the
>MR is destroyed. If the completion occurs again, of course, all
>kinds of badness happens because we're DMA-unmapping twice,
>touching memory that has already been freed, and deleting from a
>list_head that is poisonous.
>
>The last straw is that wc_localinv_done calls the generic RPC layer
>to indicate that an RPC Reply is ready. The duplicate flush
>dereferences one or more NULL pointers.
>
>Doesn't the verbs API contract stipulate that every posted WR gets
>exactly one completion? I don't see this behavior with other
>providers.
>
Indeed. Nothing else is defined and applications obviously
rely on correctness in that respect.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux