-----"Chuck Lever III" <chuck.lever@xxxxxxxxxx> wrote: ----- >To: "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx> >From: "Chuck Lever III" <chuck.lever@xxxxxxxxxx> >Date: 03/16/2021 08:59PM >Subject: [EXTERNAL] FastLinQ: possible duplicate flush of FastReg and >LocalInv > >Hi- > >I've been trying to track down some crashes when running NFS/RDMA >tests over FastLinQ devices in iWARP mode. To make it stressful, >I've enabled disconnect injection, where rpcrdma injects a >connection disconnect every so often. > >As part of a disconnect event, the Receive and Send queues are >drained. Sometimes I see a duplicate flush for one or more of >memory registration ops. This is not a big deal for FastReq >because its completion handler is basically a no-op. > >But for LocalInv this is a problem. On a flushed completion, the >MR is destroyed. If the completion occurs again, of course, all >kinds of badness happens because we're DMA-unmapping twice, >touching memory that has already been freed, and deleting from a >list_head that is poisonous. > >The last straw is that wc_localinv_done calls the generic RPC layer >to indicate that an RPC Reply is ready. The duplicate flush >dereferences one or more NULL pointers. > >Doesn't the verbs API contract stipulate that every posted WR gets >exactly one completion? I don't see this behavior with other >providers. > Indeed. Nothing else is defined and applications obviously rely on correctness in that respect.