On Wed, 2018-05-23 at 09:25 -0400, Olga Kornievskaia wrote: > On Tue, May 22, 2018 at 8:26 PM, Rick Macklem <rmacklem@xxxxxxxxxxx> > wrote: > > Olga Kornievskaia wrote: > > [good stuff snipped] > > > Upstream kernel. But I'm arguing that there shouldn't be a need > > > to > > > specify a dataserver_timeo because it shouldn't timeout at all > > > just > > > like MDS operations. > > > > If/when the server is providing mirrored DSs, I've found this > > timeout useful > > in the FreeBSD client since it allows the client to detect a DS > > failure. > > It can then report the failure to the MDS via LayoutReturn (or > > another one > > on NFSv4.2 which I can't remember the name of since I haven't done > > 4.2;-). > > > > For non-mirrored DSs, the only thing I can think of (I've never > > seen this) would > > be some sort of network partitioning such that the client can't > > reach the DS but > > can reach the MDS. > > > > I have no idea if this is relevant to Linux, but thought I'd > > mention it, just in case. > > [more stuff snipped] > > Isn't retrying makes the implementation not spec compliant? Replaying a request would not be spec compliant. Playing new requests is perfectly fine (e.g. after picking up a new layout or redirecting the I/O to the MDS). Historically, I seem to remember that at one point we introduced a 15s timeout on I/O requests to the DS in order to allow fast failover of the pNFS client when the DS was down or unresponsive. I'm not sure whether or not that mechanism still exists and whether it is what you are seeing here. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥