Re: Recovery after BAD_SEQID

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Sun, 22 Mar 2015 15:20:05 -0400

On Thu, Mar 19, 2015 at 6:48 AM, Benjamin Coddington
<bcodding@xxxxxxxxxx> wrote:
> I wrote yesterday about a RHEL6 bug, but I'd gotten some details wrong about
> the problem, so I'm starting new thread.
>
> It looks like getting BAD_SEQID back from an OPEN operation drops the state_owner
> which means that the state machine can't find or recover any other objects
> for that state_owner.  That can get the client into unrecoverable loops.  I
> can produce one of them with:
>
> 1) OPEN file1, OPEN file2
> 2) break the network for longer than the lease period
> 3) during recovery, have the server return BAD_SEQID for one of the OPENS
> 4) break the network again for longer than the lease period
> 5) WRITE to the file that recovered properly in #3
>
> This gets stuck in WRITE,NFS4ERR_EXPIRED.
>
> It looks like some cleanup is needed if we have to drop the whole
> state_owner.  Alternatively, does it make sense to just drop the objects in
> that sequence?
>
>

Ummm... Why are you seeing BAD_SEQID in the first place? That specific
error means that the client and server disagree on the sequencing of
the OPENs, which means there is a bug either on the client or on the
server.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html