Re: question: re-try of operations in PNFS

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 23 May 2018 13:42:07 +0000

On Wed, 2018-05-23 at 09:25 -0400, Olga Kornievskaia wrote:
> On Tue, May 22, 2018 at 8:26 PM, Rick Macklem <rmacklem@xxxxxxxxxxx>
> wrote:
> > Olga Kornievskaia wrote:
> > [good stuff snipped]
> > > Upstream kernel. But I'm arguing that there shouldn't be a need
> > > to
> > > specify a dataserver_timeo because it shouldn't timeout at all
> > > just
> > > like MDS operations.
> > 
> > If/when the server is providing mirrored DSs, I've found this
> > timeout useful
> > in the FreeBSD client since it allows the client to detect a DS
> > failure.
> > It can then report the failure to the MDS via LayoutReturn (or
> > another one
> > on NFSv4.2 which I can't remember the name of since I haven't done
> > 4.2;-).
> > 
> > For non-mirrored DSs, the only thing I can think of (I've never
> > seen this) would
> > be some sort of network partitioning such that the client can't
> > reach the DS but
> > can reach the MDS.
> > 
> > I have no idea if this is relevant to Linux, but thought I'd
> > mention it, just in case.
> > [more stuff snipped]
> 
> Isn't retrying makes the implementation not spec compliant?

Replaying a request would not be spec compliant. Playing new requests
is perfectly fine (e.g. after picking up a new layout or redirecting
the I/O to the MDS).

Historically, I seem to remember that at one point we introduced a 15s
timeout on I/O requests to the DS in order to allow fast failover of
the pNFS client when the DS was down or unresponsive. I'm not sure
whether or not that mechanism still exists and whether it is what you
are seeing here.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥