Re: FastLinQ: possible duplicate flush of FastReg and LocalInv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/17/2021 4:54 AM, Bernard Metzler wrote:
-----"Chuck Lever III" <chuck.lever@xxxxxxxxxx> wrote: -----

To: "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx>
From: "Chuck Lever III" <chuck.lever@xxxxxxxxxx>
Date: 03/16/2021 08:59PM
Subject: [EXTERNAL] FastLinQ: possible duplicate flush of FastReg and
LocalInv

Hi-

I've been trying to track down some crashes when running NFS/RDMA
tests over FastLinQ devices in iWARP mode. To make it stressful,
I've enabled disconnect injection, where rpcrdma injects a
connection disconnect every so often.

As part of a disconnect event, the Receive and Send queues are
drained. Sometimes I see a duplicate flush for one or more of
memory registration ops. This is not a big deal for FastReq
because its completion handler is basically a no-op.

But for LocalInv this is a problem. On a flushed completion, the
MR is destroyed. If the completion occurs again, of course, all
kinds of badness happens because we're DMA-unmapping twice,
touching memory that has already been freed, and deleting from a
list_head that is poisonous.

The last straw is that wc_localinv_done calls the generic RPC layer
to indicate that an RPC Reply is ready. The duplicate flush
dereferences one or more NULL pointers.

Doesn't the verbs API contract stipulate that every posted WR gets
exactly one completion? I don't see this behavior with other
providers.

Indeed. Nothing else is defined and applications obviously
rely on correctness in that respect.

Totally agree - any WR successfully posted must be completed, exactly
once. A missing or multiple completion is a provider bug.

Chuck, you might verify that every ib_post_send() call return code
is being checked. If you missed an error, that would allow for a
missed completion. But never a double completion, that's on the
provider.

Tom.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux