On Apr 27, 2014, at 6:12 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote: > On 4/24/2014 6:01 PM, Chuck Lever wrote: >> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote: >> >>> On 4/24/2014 2:30 AM, Devesh Sharma wrote: >>>> Hi Chuck >>>> >>>> Following is the complete call trace of a typical NFS-RDMA transaction while mounting a share. >>>> It is unavoidable to stop calling post-send in case it is not created. Therefore, applying checks to the connection state is a must >>>> While registering/deregistering frmrs on-the-fly. The unconnected state of QP implies don't call post_send/post_recv from any context. >>>> >>> Long thread... didn't follow it all. >> I think you got the gist of it. >> >>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR registration while teardown flow destroyed the QP. >>> I think this might be disappear if for each registration you post LINV+FRMR. >>> This is assuming that a situation where trying to post Fastreg on a "bad" QP can >>> never happen (usually since teardown flow typically suspends outgoing commands). >> That’s typically true for “hard” NFS mounts. But “soft” NFS mounts >> wake RPCs after a timeout while the transport is disconnected, in >> order to kill them. At that point, deregistration still needs to >> succeed somehow. > > Not sure I understand, Can you please explain why deregistration will not succeed? > AFAIK You are allowed to register FRMR and then deregister it without having to invalidate it. > > Can you please explain why you logically connected LINV with deregistration? Confusion. Sorry. > >> >> IMO there are three related problems. >> >> 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while >> there is no QP at all (->qp is NULL). The woken RPC tasks are >> trying to deregister buffers that may include page cache pages, >> and it’s oopsing because ->qp is NULL. >> >> That’s a logic bug in rpcrdma_ep_connect(), and I have an idea >> how to address it. > > Why not first create a new id+qp and assign them - and then destroy the old id+qp? > see SRP related section: ib_srp.x:srp_create_target_ib() > > Anyway it is indeed important to guarantee that no xmit flows happens concurrently to that, > and cleanups are processed synchronously and in-order. I posted a patch on Friday that should provide that serialization. > >> >> 2. If a QP is present but disconnected, posting LOCAL_INV won’t work. >> That leaves buffers (and page cache pages, potentially) registered. >> That could be addressed with LINV+FRMR. But... >> >> 3. The client should not leave page cache pages registered indefinitely. >> Both LINV+FRMR and our current approach depends on having a working >> QP _at_ _some_ _point_ … but the client simply can’t depend on that. >> What happens if an NFS server is, say, destroyed by fire while there >> are active client mount points? What if the HCA’s firmware is >> permanently not allowing QP creation? > > Again, I don't understand why you can't dereg_mr(). > > How about allocating the FRMR pool *after* the QP was successfully created/connected (makes sense as the FRMRs are > not usable until then), and destroy/cleanup the pool before the QP is disconnected/destroyed. it also makes sense as they > must match PDs. It’s not about deregistration, but rather about invalidation, I was confused. xprt_rdma_free() invalidates and then frees the chunks on RPC chunk lists. We just need to see that those invalidations can be successful while the transport is disconnected. I understand that even in the error state, a QP should allow consumers to post send WRs to invalidate FRMRs…? The other case is whether the consumer can _replace_ a QP with a fresh one, and still have invalidations succeed, even if the transport remains disconnected, once waiting RPCs are awoken. An alternative would be to invalidate all waiting RPC chunk lists on a transport as soon as the QP goes to error state but before it is destroyed, and fastreg the chunks again when waiting RPCs are remarshalled. I think the goals are: 1. Avoid fastreg on an FRMR that is already valid 2. Avoid leaving FRMRs valid indefinitely (preferably just long enough to execute the RPC request, and no longer) > >> >> Here's a relevant comment in rpcrdma_ep_connect(): >> >> 815 /* TEMP TEMP TEMP - fail if new device: >> 816 * Deregister/remarshal *all* requests! >> 817 * Close and recreate adapter, pd, etc! >> 818 * Re-determine all attributes still sane! >> 819 * More stuff I haven't thought of! >> 820 * Rrrgh! >> 821 */ >> >> xprtrdma does not do this today. >> >> When a new device is created, all existing RPC requests could be >> deregistered and re-marshalled. As far as I can tell, >> rpcrdma_ep_connect() is executing in a synchronous context (the connect >> worker) and we can simply use dereg_mr, as long as later, when the RPCs >> are re-driven, they know they need to re-marshal. > > Agree. > > Sagi. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html