On Fri, 2009-12-18 at 10:37 -0500, Jeff Layton wrote: > On Fri, 18 Dec 2009 10:12:22 -0500 > Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> wrote: > > > On Fri, 2009-12-18 at 09:47 -0500, Trond Myklebust wrote: > > > On Fri, 2009-12-18 at 09:39 -0500, Jeff Layton wrote: > > > > Without a separate downcall error field, we'll need to special case at > > > > least 2 different errors -- one for a "real" EACCES and one that > > > > indicates that the ticket expired and the upcall should be retried > > > > instead. > > > > > > We can find another error for the 'ticket expired' case. EKEYEXPIRED > > > springs to mind... > > > > BTW: Here be dragons! > > > > I think we need to handle the 'ticket expired' case as if it were an > > NFS4ERR_DELAY/EJUKEBOX, and actually do the retry in the NFS layer after > > a suitable exponential back-off period. > > > > Otherwise, we end up holding onto resources (in particular NFSv4.1 > > slots, but also RPC slots, ...) which will cause congestion, and prevent > > other RPC calls from making progress. > > > > Thanks. My original thought was that we should handle this situation as > we do when gssd is down -- just retry at the RPC layer. I hadn't > considered the resource issue however. I'll shoot for making the retry > happen at the NFS layer instead. That should also make it easier to > handle this situation differently on hard vs. soft mounts too. > It will also make it easier to do things like preventing flushd from hanging forever on a set of writebacks that cannot make progress. At some point we might also want to allow the administrator to set a limit on the number of write retries, so that a user who decides to go on a 1 year sabbatical doesn't end up holding up access to a file forever... Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html