On Wed, May 23, 2018 at 9:42 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > On Wed, 2018-05-23 at 09:25 -0400, Olga Kornievskaia wrote: >> On Tue, May 22, 2018 at 8:26 PM, Rick Macklem <rmacklem@xxxxxxxxxxx> >> wrote: >> > Olga Kornievskaia wrote: >> > [good stuff snipped] >> > > Upstream kernel. But I'm arguing that there shouldn't be a need >> > > to >> > > specify a dataserver_timeo because it shouldn't timeout at all >> > > just >> > > like MDS operations. >> > >> > If/when the server is providing mirrored DSs, I've found this >> > timeout useful >> > in the FreeBSD client since it allows the client to detect a DS >> > failure. >> > It can then report the failure to the MDS via LayoutReturn (or >> > another one >> > on NFSv4.2 which I can't remember the name of since I haven't done >> > 4.2;-). >> > >> > For non-mirrored DSs, the only thing I can think of (I've never >> > seen this) would >> > be some sort of network partitioning such that the client can't >> > reach the DS but >> > can reach the MDS. >> > >> > I have no idea if this is relevant to Linux, but thought I'd >> > mention it, just in case. >> > [more stuff snipped] >> >> Isn't retrying makes the implementation not spec compliant? > > Replaying a request would not be spec compliant. Playing new requests > is perfectly fine (e.g. after picking up a new layout or redirecting > the I/O to the MDS). I see you are right. The request to the MDS is a "new request" as it uses a different filehandle. > Historically, I seem to remember that at one point we introduced a 15s > timeout on I/O requests to the DS in order to allow fast failover of > the pNFS client when the DS was down or unresponsive. I'm not sure > whether or not that mechanism still exists and whether it is what you > are seeing here. Then I'd guess it probably is that and the timeout now is 10s. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html