On Jun 23, 2014, at 12:22 PM, Hefty, Sean <sean.hefty@xxxxxxxxx> wrote: >> Steve Wise is helping me with a particular issue where QP re-use might >> be helpful. >> >> When an RPC/RDMA transport connection is dropped (for example, the NFS >> server crashes), xprtrdma destroys the transport's QP and creates a >> new one for the next connection. > > If the remote side crashes, the local QP can transition into the error state, which would flush all posted receives. I believe that a WR that has completed in error only has the wr_id field valid. > > Note that calling rdma_disconnect() will also transition the QP into the error state. So on remote disconnect there are two steps: 1. The QP is transitioned to the error state 2. Later, when xprtrdma attempts to reconnect, it’s transport connect worker destroys the old QP I think you and Devesh are suggesting that when the QP is transitioned to error state in step 1, the provider immediately flushes the send and completion queues appropriately, leaving no possibility of a completed WR with a dropped completion. > >> We're not quite sure what IB_WC_WR_FLUSH_ERR means in that instance. Our >> theory is there is a gap when the old QP is destroyed: >> >> 1. If the HW reports a successful WR completion but the QP no longer >> exists, the provider substitutes an IB_WC_WR_FLUSH_ERR completion >> >> 2. If the WR is dropped before the HW even saw it, the provider inserts >> an IB_WC_WR_FLUSH_ERR completion >> >> So if xprtrdma is trying to submit a FAST_REG_MR WR and the completion >> gets flushed, xprtrdma has no way to know whether the rkey was bumped in >> the adapter. Thus it has no certainty which rkey to use to invalidate >> that FRMR. > > I'm not familiar with the behavior of fast reg mr. For the record, with both mlx4 and cxgb4, we see FRMRs left valid after a FAST_REG_MR is flushed during a connection loss. More study needed, obviously. >> I was idly wondering whether re-using the QP during connection loss >> would provide a guarantee that xprtrdma would never see case 1 above. >> Then IB_WC_WR_FLUSH_ERR on a FAST_REG_MR WR would be a more certain >> indication that the HW still has the old rkey. >> >> I suppose that xprtrdma can "hang onto" the QP without re-using it by >> simply not destroying it until all WRs scheduled on the old QP are >> completed. Is reference counting the QP the usual design pattern to deal >> with this case? > > I _thought_ that destroying the QP would cleanup any completion entries in the CQ, but I'm not sure of this. Referencing counting should work though. As a workaround, I can comment out the rdma_destroy_qp() call in xprtrdma's connect worker to see if there’s any change in behavior when the old QP stays around. Given that the queues are flushed on RTS->Error, probably won’t see any difference at all. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html