Re: READ during state recovery uses zero stateid

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 24 Aug 2016 19:05:20 +0000

> On Aug 24, 2016, at 14:47, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
> 
>> On Aug 24, 2016, at 2:23 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>> 
>>> 
>>> On Aug 24, 2016, at 14:10, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> Hi-
>>> 
>>> I have a wire capture that shows this race while a simple I/O workload is
>>> running:
>>> 
>>> 0. The client reconnects after a network partition
>>> 1. The client sends a couple of READ requests
>>> 2. The client independently discovers its lease has expired
>>> 3. The client establishes a fresh lease
>>> 4. The client destroys open, lock, and delegation stateids for the file
>>> that was open under the previous lease
>>> 5. The client issues a new OPEN to recover state for that file
>>> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
>>> 7. The client turns the READs around immediately using the current open
>>> stateid for that file, which is the zero stateid
>>> 8. The server replies NFS4_OK to the OPEN from step 5
>>> 
>>> If I understand the code correctly, if the server happened to send those
>>> READ replies after its OPEN reply (rather than before), the client would
>>> have used the recovered open stateid instead of the zero stateid when
>>> resending the READ requests.
>>> 
>>> Would it be better if the client recognized there is state recovery in
>>> progress, and then waited for recovery to complete, before retrying the
>>> READs?
>>> 
>> 
>> Why isn’t the session draining taking care of ensuring the READs don’t happen until after recovery is done?
> 
> This is NFSv4.0. (Apologies, I recalled NFS4ERR_EXPIRED had been removed
> from NFSv4.1, but I see that I was mistaken).
> 
> Here's step 1 and 2, exactly. After the partition heals, the client sends:
> 
> C READ
> C GETATTR
> C READ
> C RENEW
> 
> The server responds to the RENEW first with GSS_CTXPROBLEM. The client's
> gssd connects and establishes a fresh GSS context. The client sends the
> RENEW again with the fresh context, and the server responds NFS4ERR_EXPIRED.
> This triggers step 3.
> 
> The replies for those READ calls are in step 6., after state recovery
> has started.

This is what I’m confused about: Normally, I’d expect the NFSv4.0 code to drain, due to the checks in nfs40_setup_sequence().

IOW: there should be 2 steps

   2.5) Call nfs4_drain_slot_tbl() and wait for operations to complete
   2.6) Process the NFS4ERR_EXPIRED errors returned by the READ requests sent in (1).

before we get to recovering the lease in (3)...

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html