> On Jun 20, 2017, at 1:35 PM, Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Jun 20, 2017 at 01:01:39PM -0400, Chuck Lever wrote: > >>>> Shouldn't this be protected somehow by the device? >>>> Can someone explain why the above cannot happen? Jason? Liran? Anyone? >>>> Say host register MR (a) and send (1) from that MR to a target, >>>> send (1) ack got lost, and the target issues SEND_WITH_INVALIDATE >>>> on MR (a) and the host HCA process it, then host HCA timeout on send (1) >>>> so it retries, but ehh, its already invalidated. > > I'm not sure I understand the example.. but... > > If you pass a MR key to a send, then that MR must remain valid until > the send completion is implied by an observation on the CQ. The HCA is > free to re-execute the SEND against the MR at any time up until the > completion reaches the CQ. > > As I've explained before, a ULP must not use 'implied completion', eg > a receive that could only have happened if the far side got the > send. In particular this means it cannot use an incoming SEND_INV/etc > to invalidate an MR associated with a local SEND, as that is a form > of 'implied completion' > > For sanity a MR associated with a local send should not be remote > accessible at all, and shouldn't even have a 'rkey', just a 'lkey'. > > Similarly, you cannot use a MR with SEND and remote access sanely, as > the far end could corrupt or invalidate the MR while the local HCA is > still using it. > >> So on occasion there is a Remote Access Error. That would >> trigger connection loss, and the retransmitted Send request >> is discarded (if there was externally exposed memory involved >> with the original transaction that is now invalid). > > Once you get a connection loss I would think the state of all the MRs > need to be resync'd. Running through the CQ should indicate which ones > are invalidate and which ones are still good. > >> NFS has a duplicate replay cache. If it sees a repeated RPC >> XID it will send a cached reply. I guess the trick there is >> to squelch remote invalidation for such retransmits to avoid >> spurious Remote Access Errors. Should be rare, though. > > .. and because of the above if a RPC is re-issued it must be re-issued > with corrected, now-valid rkeys, and the sender must somehow detect > that the far side dropped it for replay and tear down the MRs. Yes, if RPC-over-RDMA ULP is involved, any externally accessible memory will be re-registered before an RPC retransmission. The concern is whether a retransmitted Send will be exposed to the receiving ULP. Below you imply that it will not be, so perhaps this is not a concern after all. >> RPC-over-RDMA uses persistent registration for its inline >> buffers. The problem there is avoiding buffer reuse to soon. >> Otherwise a garbled inline message is presented on retransmit. >> Those would probably not be caught by the DRC. > > We've had this discussion on the list before. You can *never* re-use a > SEND, or RDMA WRITE buffer until you observe the HCA is done with it > via a CQ poll. RPC-over-RDMA is careful to invalidate buffers that are the target of RDMA Write before RPC completion, as we have discussed before. Sends are assumed to be complete when a LocalInv completes. When we had this discussion before, you explained the problem with retransmitted Sends, but it appears that all the ULPs we have operate without Send completion. Others whom I trust have suggested that operating without that extra interrupt is preferred. The client has operated this way since it was added to the kernel almost 10 years ago. So I took it as a "in a perfect world" kind of admonition. You are making a stronger and more normative assertion here. >> But the real problem is preventing retransmitted Sends from >> causing a ULP request to be executed multiple times. > > IB RC guarentees single delivery for SEND, so that doesn't seem > possible unless the ULP re-transmits the SEND on a new QP. > >>> Signalling all send completions and also finishing I/Os only after >>> we got them will add latency, and that sucks... > > There is no choice, you *MUST* see the send completion before > reclamining any resources associated with the send. Only the > completion guarentees that the HCA will not resend the packet or > otherwise continue to use the resources. On the NFS server side, I believe every Send is signaled. On the NFS client side, we assume LocalInv completion is good enough. >> With FRWR, won't subsequent WRs be delayed until the HCA is >> done with the Send? I don't think a signal is necessary in >> every case. Send Queue accounting currently relies on that. > > No. The SQ side is asynchronous to the CQ side, the HCA will pipeline > send packets on the wire up to some internal limit. So if my ULP issues FastReg followed by Send followed by LocalInv (signaled), I can't rely on the LocalInv completion to imply that the Send is also complete? > Only the local state changed by FRWR related op codes happens > sequentially with other SQ work. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html