On Thu, Mar 19, 2015 at 6:48 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > I wrote yesterday about a RHEL6 bug, but I'd gotten some details wrong about > the problem, so I'm starting new thread. > > It looks like getting BAD_SEQID back from an OPEN operation drops the state_owner > which means that the state machine can't find or recover any other objects > for that state_owner. That can get the client into unrecoverable loops. I > can produce one of them with: > > 1) OPEN file1, OPEN file2 > 2) break the network for longer than the lease period > 3) during recovery, have the server return BAD_SEQID for one of the OPENS > 4) break the network again for longer than the lease period > 5) WRITE to the file that recovered properly in #3 > > This gets stuck in WRITE,NFS4ERR_EXPIRED. > > It looks like some cleanup is needed if we have to drop the whole > state_owner. Alternatively, does it make sense to just drop the objects in > that sequence? > > Ummm... Why are you seeing BAD_SEQID in the first place? That specific error means that the client and server disagree on the sequencing of the OPENs, which means there is a bug either on the client or on the server. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html