Re: question about open_owner sequencing

Olga Kornievskaia <aglo@xxxxxxxxx> · Fri, 17 Mar 2017 17:19:40 -0400

On Fri, Mar 17, 2017 at 4:55 PM, Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> wrote:
>> On Fri, Mar 17, 2017 at 1:45 PM, Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> wrote:
>> >  Hi folks,
>> >>
>> >> I have a question about recovery from the BAD_SEQID and what should
>> >> happen.
>> >>
>> >> I have the following application that does:
>> >>
>> >> 1. open(file1)
>> >> 2. open(file2)
>> >> 3. close(file1)
>> >> 4. open(file3)
>> >> 5. lock(file2)
>> >>
>> >> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later
>> >> fails with BAD_SEQID as well.
>> >>
>> >> step1 OPEN creates open_owner1 seq 0
>> >> step2 OPEN uses open_owner1 seq1
>> >> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
>> >> step4 OPEN sends new open_owner2 seq2 and it triggers
>> OPEN_CONFIRM
>> >> with seq3
>> >> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
>> >>
>> >> LOCK gets BAD_SEQID.
>> >>
>> >> Question: is client sending something incorrect? is server not
>> >> correct? I tested against two different servers (Linux and NetApp)
>> >> and both reply the same way so I'm leaning towards "no". But I don't
>> >> see why "seq4" is not a valid sequence given that the
>> open_owner/sequence was just confirmed.
>> >
>> > Wait step4 is using a new open owner? Each open owner has its own seqid
>> (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the sequencing
>> is done for the session with the SEQUENCE op).
>>
>> Yes this is v4.0. Yes step4 uses new open owner but seq# doesn't go to 0.
>> This is the new behavior to not drop the open owner as per the following
>> commit (below).
>>
>> Since LOCK just has the seq# (and not a value of the open_owner) I thought
>> it's be the "valid" (current) open owner which would be open_owner2.
>
> Hmm, so in step5, there is not yet a lock stateid?
>
> So it's using this form of the lock?
>
> struct open_to_lock_owner4 {
> seqid4 open_seqid;
> stateid4 open_stateid;
> seqid4 lock_seqid;
> lock_owner4 lock_owner;
>
> If so, open_seqid should be 3, lock_seqid can be anything.

Why is it 3? As far as I can tell, 3 is not a valid seq_id for either
open_owner1 or open_owner2. open_owner1 is left at seq_id=2 (because
after "using" seq2 on the CLOSE it got BAD_SEQID so seq_id isn't
incremented) and open_owner2 would have seq_id=4 (OPEN_CONFIRM used up
3)?

>From 7530 section 16.10.5:

Note that
      although the open-owner is not given explicitly, the open_seqid
      associated with it is used to check for open-owner sequencing
      issues. This case provides a method to use the established state
      of the open_stateid to transition to the use of a lock stateid.

>
> At least that's my reading. But I'm not sure how client is supposed to recover from BAD_SEQID...
>
> Frank
>
>> So after step4, are the 2 open owners then: one with value open_owner1
>> (seq2) and one with value open_owner2 (seq3). And then since LOCK is
>> associated with the OPEN from step1 and then open_owner 1, then should it
>> send send seq2?
>>
>> Neil, when would the client remove this open owner1  that would have been
>> removed prior to this patch?
>>
>> commit 86cfb0418537460baf0de0b5e9253784be27a6f9
>> Author: NeilBrown <neilb@xxxxxxxx>
>> Date:   Mon Dec 19 11:48:23 2016 +1100
>>
>>     NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
>>
>>     When an NFS4ERR_BAD_SEQID is received the open-owner is removed
>> from
>>     the ->state_owners rbtree so that it will no longer be used.
>>
>>     If any stateids attached to this open-owner are still in use, and if a
>>     request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>>
>>     The state is marked as needing recovery and the nfs4_state_manager()
>>     is scheduled to clean up.  nfs4_state_manager() finds states to be
>>     recovered by walking the state_owners rbtree.  As the open-owner is
>>     not in the rbtree, the bad state is not found so nfs4_state_manager()
>>     completes having done nothing.  The request is then retried, with a
>>     predicatable result (indefinite retries).
>>
>>     If the stateid is for a delegation, this open_owner will be used
>>     to open files when the delegation is returned.  For that to work,
>>     a new open-owner needs to be presented to the server.
>>
>>     This patch changes NFS4ERR_BAD_SEQID handling to leave the open-
>> owner
>>     in the rbtree but updates the 'create_time' so it looks like a new
>>     open-owner.  With this the indefinite retries no longer happen.
>>
>>     Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>>     Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>
>>
>> >
>> > Frank
>> >
>> >
>> > ---
>> > This email has been checked for viruses by Avast antivirus software.
>> > https://www.avast.com/antivirus
>> >
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html