> On Aug 24, 2016, at 14:10, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > Hi- > > I have a wire capture that shows this race while a simple I/O workload is > running: > > 0. The client reconnects after a network partition > 1. The client sends a couple of READ requests > 2. The client independently discovers its lease has expired > 3. The client establishes a fresh lease > 4. The client destroys open, lock, and delegation stateids for the file > that was open under the previous lease > 5. The client issues a new OPEN to recover state for that file > 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED > 7. The client turns the READs around immediately using the current open > stateid for that file, which is the zero stateid > 8. The server replies NFS4_OK to the OPEN from step 5 > > If I understand the code correctly, if the server happened to send those > READ replies after its OPEN reply (rather than before), the client would > have used the recovered open stateid instead of the zero stateid when > resending the READ requests. > > Would it be better if the client recognized there is state recovery in > progress, and then waited for recovery to complete, before retrying the > READs? > Why isn’t the session draining taking care of ensuring the READs don’t happen until after recovery is done? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html