Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 24 Apr 2014 13:44:31 -0400

On Apr 24, 2014, at 11:48 AM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote:

> Thanks Chuck for summarizing.
> One more issue is being added to the list below.
> 
>> -----Original Message-----
>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Chuck Lever
>> Sent: Thursday, April 24, 2014 8:31 PM
>> To: Sagi Grimberg
>> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx;
>> Trond Myklebust
>> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
>> 
>> 
>> On Apr 24, 2014, at 3:12 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx>
>> wrote:
>> 
>>> On 4/24/2014 2:30 AM, Devesh Sharma wrote:
>>>> Hi Chuck
>>>> 
>>>> Following is the complete call trace of a typical NFS-RDMA transaction
>> while mounting a share.
>>>> It is unavoidable to stop calling post-send in case it is not
>>>> created. Therefore, applying checks to the connection state is a must
>> While registering/deregistering frmrs on-the-fly. The unconnected state of
>> QP implies don't call  post_send/post_recv from any context.
>>>> 
>>> 
>>> Long thread... didn't follow it all.
>> 
>> I think you got the gist of it.
>> 
>>> If I understand correctly this race comes only for *cleanup* (LINV) of FRMR
>> registration while teardown flow destroyed the QP.
>>> I think this might be disappear if for each registration you post LINV+FRMR.
>>> This is assuming that a situation where trying to post Fastreg on a
>>> "bad" QP can never happen (usually since teardown flow typically suspends
>> outgoing commands).
>> 
>> That's typically true for "hard" NFS mounts. But "soft" NFS mounts wake
>> RPCs after a timeout while the transport is disconnected, in order to kill
>> them.  At that point, deregistration still needs to succeed somehow.
>> 
>> IMO there are three related problems.
>> 
>> 1.  rpcrdma_ep_connect() is allowing RPC tasks to be awoken while
>>    there is no QP at all (->qp is NULL). The woken RPC tasks are
>>    trying to deregister buffers that may include page cache pages,
>>    and it's oopsing because ->qp is NULL.
>> 
>>    That's a logic bug in rpcrdma_ep_connect(), and I have an idea
>>    how to address it.
>> 
>> 2.  If a QP is present but disconnected, posting LOCAL_INV won't work.
>>    That leaves buffers (and page cache pages, potentially) registered.
>>    That could be addressed with LINV+FRMR. But...
>> 
>> 3.  The client should not leave page cache pages registered indefinitely.
>>    Both LINV+FRMR and our current approach depends on having a working
>>    QP _at_ _some_ _point_ ... but the client simply can't depend on that.
>>    What happens if an NFS server is, say, destroyed by fire while there
>>    are active client mount points? What if the HCA's firmware is
>>    permanently not allowing QP creation?
> Addition to the list
> 4. If rdma traffic is in progress and  the network link goes down and comes back up after some time (t > 10 secs ), 
>    The rpcrdma_ep_connect() does not destroys the existing QP because rpcrdma_create_id fails (rdma_resolve_addr fails).
>    Now, once the connect worker thread Gets rescheduled again, every time CM fails with establishment error. Finally, after multiple tries
>    CM fails with rdma_cm_event = 15 and entire recovery thread sits silently forever and kernel reports user app is blocked for more than 120 secs.

I think I see that now. I should be able to address it with the fixes for 1.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html