Thanks Chuck for summarizing. One more issue is being added to the list below. > -----Original Message----- > From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > owner@xxxxxxxxxxxxxxx] On Behalf Of Chuck Lever > Sent: Thursday, April 24, 2014 8:31 PM > To: Sagi Grimberg > Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx; > Trond Myklebust > Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks > > > On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> > wrote: > > > On 4/24/2014 2:30 AM, Devesh Sharma wrote: > >> Hi Chuck > >> > >> Following is the complete call trace of a typical NFS-RDMA transaction > while mounting a share. > >> It is unavoidable to stop calling post-send in case it is not > >> created. Therefore, applying checks to the connection state is a must > While registering/deregistering frmrs on-the-fly. The unconnected state of > QP implies don't call post_send/post_recv from any context. > >> > > > > Long thread... didn't follow it all. > > I think you got the gist of it. > > > If I understand correctly this race comes only for *cleanup* (LINV) of FRMR > registration while teardown flow destroyed the QP. > > I think this might be disappear if for each registration you post LINV+FRMR. > > This is assuming that a situation where trying to post Fastreg on a > > "bad" QP can never happen (usually since teardown flow typically suspends > outgoing commands). > > That's typically true for "hard" NFS mounts. But "soft" NFS mounts wake > RPCs after a timeout while the transport is disconnected, in order to kill > them. At that point, deregistration still needs to succeed somehow. > > IMO there are three related problems. > > 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while > there is no QP at all (->qp is NULL). The woken RPC tasks are > trying to deregister buffers that may include page cache pages, > and it's oopsing because ->qp is NULL. > > That's a logic bug in rpcrdma_ep_connect(), and I have an idea > how to address it. > > 2. If a QP is present but disconnected, posting LOCAL_INV won't work. > That leaves buffers (and page cache pages, potentially) registered. > That could be addressed with LINV+FRMR. But... > > 3. The client should not leave page cache pages registered indefinitely. > Both LINV+FRMR and our current approach depends on having a working > QP _at_ _some_ _point_ ... but the client simply can't depend on that. > What happens if an NFS server is, say, destroyed by fire while there > are active client mount points? What if the HCA's firmware is > permanently not allowing QP creation? Addition to the list 4. If rdma traffic is in progress and the network link goes down and comes back up after some time (t > 10 secs ), The rpcrdma_ep_connect() does not destroys the existing QP because rpcrdma_create_id fails (rdma_resolve_addr fails). Now, once the connect worker thread Gets rescheduled again, every time CM fails with establishment error. Finally, after multiple tries CM fails with rdma_cm_event = 15 and entire recovery thread sits silently forever and kernel reports user app is blocked for more than 120 secs. > > Here's a relevant comment in rpcrdma_ep_connect(): > > 815 /* TEMP TEMP TEMP - fail if new device: > 816 * Deregister/remarshal *all* requests! > 817 * Close and recreate adapter, pd, etc! > 818 * Re-determine all attributes still sane! > 819 * More stuff I haven't thought of! > 820 * Rrrgh! > 821 */ > > xprtrdma does not do this today. > > When a new device is created, all existing RPC requests could be > deregistered and re-marshalled. As far as I can tell, > rpcrdma_ep_connect() is executing in a synchronous context (the connect > worker) and we can simply use dereg_mr, as long as later, when the RPCs are > re-driven, they know they need to re-marshal. > > I'll try some things today. > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html