On Apr 24, 2014, at 11:48 AM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote: > Thanks Chuck for summarizing. > One more issue is being added to the list below. > >> -----Original Message----- >> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- >> owner@xxxxxxxxxxxxxxx] On Behalf Of Chuck Lever >> Sent: Thursday, April 24, 2014 8:31 PM >> To: Sagi Grimberg >> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx; >> Trond Myklebust >> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks >> >> >> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> >> wrote: >> >>> On 4/24/2014 2:30 AM, Devesh Sharma wrote: >>>> Hi Chuck >>>> >>>> Following is the complete call trace of a typical NFS-RDMA transaction >> while mounting a share. >>>> It is unavoidable to stop calling post-send in case it is not >>>> created. Therefore, applying checks to the connection state is a must >> While registering/deregistering frmrs on-the-fly. The unconnected state of >> QP implies don't call post_send/post_recv from any context. >>>> >>> >>> Long thread... didn't follow it all. >> >> I think you got the gist of it. >> >>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR >> registration while teardown flow destroyed the QP. >>> I think this might be disappear if for each registration you post LINV+FRMR. >>> This is assuming that a situation where trying to post Fastreg on a >>> "bad" QP can never happen (usually since teardown flow typically suspends >> outgoing commands). >> >> That's typically true for "hard" NFS mounts. But "soft" NFS mounts wake >> RPCs after a timeout while the transport is disconnected, in order to kill >> them. At that point, deregistration still needs to succeed somehow. >> >> IMO there are three related problems. >> >> 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while >> there is no QP at all (->qp is NULL). The woken RPC tasks are >> trying to deregister buffers that may include page cache pages, >> and it's oopsing because ->qp is NULL. >> >> That's a logic bug in rpcrdma_ep_connect(), and I have an idea >> how to address it. >> >> 2. If a QP is present but disconnected, posting LOCAL_INV won't work. >> That leaves buffers (and page cache pages, potentially) registered. >> That could be addressed with LINV+FRMR. But... >> >> 3. The client should not leave page cache pages registered indefinitely. >> Both LINV+FRMR and our current approach depends on having a working >> QP _at_ _some_ _point_ ... but the client simply can't depend on that. >> What happens if an NFS server is, say, destroyed by fire while there >> are active client mount points? What if the HCA's firmware is >> permanently not allowing QP creation? > Addition to the list > 4. If rdma traffic is in progress and the network link goes down and comes back up after some time (t > 10 secs ), > The rpcrdma_ep_connect() does not destroys the existing QP because rpcrdma_create_id fails (rdma_resolve_addr fails). > Now, once the connect worker thread Gets rescheduled again, every time CM fails with establishment error. Finally, after multiple tries > CM fails with rdma_cm_event = 15 and entire recovery thread sits silently forever and kernel reports user app is blocked for more than 120 secs. I think I see that now. I should be able to address it with the fixes for 1. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html