Re: pynfs replay cache test SEQ9f

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 13, 2017 at 02:50:15PM -0400, bfields@xxxxxxxxxxxx wrote:
> On Fri, Oct 13, 2017 at 03:26:51PM +0000, Trond Myklebust wrote:
> > On Fri, 2017-10-13 at 11:00 -0400, bfields@xxxxxxxxxxxx wrote:
> > > OK, OK, I'll look into fixing the server (I'm pretty sure we get this
> > > wrong).
> > > 
> > > You've explained the ctrl-C case before and I don't think I
> > > understood
> > > it.  I guess otherwise the only way for the client to sort out the
> > > situation would be to retry the original request.  And that requires
> > > keeping the arguments and credentials around to handle potential
> > > retries.  And that's impractical if the process is going away?  OK.
> > > 
> > 
> > Right, we're not going to do that just for data that is just going to
> > be tossed away anyway. We already guarantee that non-idempotent
> > operations (the ones that we actually do ask the server to cache) are
> > guaranteed to complete whether or not the user presses ^C, so this is
> > mainly about what happens when somebody interrupts an operation that we
> > did not want the server to cache.
> > 
> > I have a patch out there that just replays a SEQUENCE op if we detect
> > that an RPC call was interrupted. That should be sufficient to deal
> > with servers that cache everything (whether or not the client sets
> > sa_cachethis), but don't want to do NFS4ERR_SEQ_FALSE_RETRY. That
> > particular combination has been seen to be extremely toxic to the
> > current client, because it can get replayed LOOKUP or GETATTR requests
> > after someone presses ^C.
> 
> Those all involve uncached compounds with more than one op.  My reading
> of knfsd code is that it will return RETRY_UNCACHED_REP in this case,
> and I think (I might be misunderstanding) that the client will bump the
> slot seqid and retry in that case.  So I *think* you shouldn't be seeing
> that problem with knfsd?

Argh, no, you're sending a bare SEQUENCE so of course there's just one
op.

And looking at Olga's COPY example and the code....  The server gets
confused in this case and returns a reply to the SEQUENCE, nothing else,
but sets the reply's opcnt to the count taken from the original call,
for some reason.

So, the server's returning a corrupt reply.  It needs to return a reply
that's actually legal xdr and SEQUENCE results that match the call.
Beyond that it probably doesn't matter exactly what it returns--either
it handles it as a replay and doesn't bump the seqid, or a new call and
does, but either way the seqid ends up in the same place, which is the
goal here.  OK.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux