Re: nfs4_lock_delegation_recall() improperly handles errors such as ERROR_GRACE

Olga Kornievskaia <aglo@xxxxxxxxx> · Wed, 24 Sep 2014 19:20:57 -0400

On Wed, Sep 24, 2014 at 6:45 PM, Trond Myklebust
<trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> On Wed, Sep 24, 2014 at 6:31 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>> On Wed, Sep 24, 2014 at 3:57 PM, Trond Myklebust
>> <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>>> Hi Olga,
>>>
>>> On Wed, Sep 24, 2014 at 2:20 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>> Hi Trond,
>>>>
>>>> nfs_delegation_claim_opens() return EAGAIN to nfs_end_delegation_return().
>>>> issync is always 0 (as its called by the
>>>> nfs_client_return_marked_delegations) and it breaks out of the loop...
>>>> as a result the error just doesn't get handled.
>>>
>>> Ah. OK, so this is being called from
>>> nfs_client_return_marked_delegations. That makes sense.
>>>
>>> So for that case, I'd expect the call to return to the loop in
>>> nfs4_state_manager(), and then to retry through that after doing
>>> whatever is needed to recover.
>>> Essentially, we should be setting NFS4CLNT_DELEGRETURN again, and then
>>> bouncing back into nfs_client_return_marked_delegations (after all the
>>> recovery work has been done).
>>
>> Yes I don't fully understand what it should be. It never does anything
>> about recovering from the lock error and simply returns the
>> delegation. Ok I don't know if it means anything to you, but the 2nd
>> time around (when it returns the delegation even though it hasn't
>> recovered the lock), it never goes into the
>> nfs4_open_delegation_recall() because stateid condition doesn't hold
>> true.
>>
>> If it's not too much trouble, could you explain why lock error
>> shouldn't be handled as I suggested instead of resending the open with
>> claim_cur over again. As I understand in your case, it'll be a series
>> of successful open with claim_cur paired with a failed lock with
>> err_grace. In my case, it'll be one open with claim_cur and a number
>> of lock with err_grace.
>
> There is only 1 state manager thread allowed per nfs_client (i.e. per
> server) and so we want to avoid having it busy wait in any one state
> handler. Doing so would basically mean that all other state recovery
> on that nfs_client is on hold; i.e. we could not deal with exceptions
> like ADMIN_REVOKED, CB_PATH_DOWN, etc until the busy wait is over.
> This is why that code has been designed to fall all the way back to
> nfs4_state_manager() in the event of any error/exception.

Ok, thanks. It make sense. And makes things complicated. I'm sure
you'll beat me to figuring out why the error is not handled but I'll
keep trying.

>
> --
> Trond Myklebust
>
> Linux NFS client maintainer, PrimaryData
>
> trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html