Re: [PATCH 5/6] nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE

"J. Bruce Fields" <bfields@xxxxxxxxxx> · Mon, 30 Nov 2020 21:30:47 -0500

On Tue, Dec 01, 2020 at 12:39:11AM +0000, Trond Myklebust wrote:
> On Mon, 2020-11-30 at 18:05 -0500, J. Bruce Fields wrote:
> > On Mon, Nov 30, 2020 at 04:24:54PM -0500, trondmy@xxxxxxxxxx wrote:
> > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > 
> > > If the underlying filesystem times out, then we want knfsd to
> > > return
> > > NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE.
> > 
> > Out of curiosity, what was causing ETIMEDOUT in practice?
> > 
> 
> If you're only re-exporting NFS from a single server, then it is OK to
> use hard mounts. However if you are exporting from multiple servers, or
> you have local filesystems that are also being exported by the same
> knfsd server, then you usually want to use softerr mounts for NFS so
> that operations that take an inordinate amount of time due to temporary
> server outages get converted into JUKEBOX/DELAY errors. Otherwise, it
> is really simple to cause all the nfsd threads to hang on that one
> delinquent server.

Makes sense, thanks.

In theory the same thing could happen with block devices; longer term I
wonder if it'd make sense to limit how many threads are waiting on a
single backend.

(ACK to the patch, though, that'd be a project for another day.)

--b.