Hi Trond, ----- Original Message ----- > From: "Trond Myklebust" <trondmy@xxxxxxxxx> > To: "Olga Kornievskaia" <aglo@xxxxxxxxx> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > Sent: Tuesday, April 2, 2019 8:28:38 PM > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote: >> On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@xxxxxxxxx> >> wrote: >> > This patchset aims to make soft mounts a viable option for NFSv4 >> > clients >> > by minimising the risk of false positive timeouts, while allowing >> > for >> > faster failover of reads and writes once a timeout is actually >> > observed. >> > >> > The patches rely on the NFS server correctly implementing the >> > contract >> > specified in RFC7530 section 3.1.1 with respect to not dropping >> > requests >> > while the transport connection is up. When this is the case, the >> > client >> > can safely assume that if the request has not received a reply >> > after >> > transmitting a RPC request, it is not because the request was >> > dropped, >> > but rather is due to congestion, or slow processing on the server. >> > IOW: as long as the connection remains up, there is no need for >> > requests >> > to time out. >> > >> > The patches break down roughly as follows: >> > - A set of patches to clean up the RPC engine timeouts, and ensure >> > they >> > are accurate. >> > - A set of patches to change the 'soft' mount semantics for >> > NFSv4.x. >> > - A set of patches to add a new 'softerr' mount option that works >> > like >> > soft, but explicitly signals timeouts using the ETIMEDOUT error >> > code >> > rather than using EIO. This allows applications to tune their >> > behaviour (e.g. by failing over to a different server) if a >> > timeout >> > occurs. >> >> I'm just curious why would an application be aware of a different >> server to connect to and an NFS layer would not be? I'm also curious >> wouldn't it break application that typically expect to get an EIO >> errors? Do all system calls allow to return ETIMEDOUT error? > > This is why it is a separate mount option. ...and actually most > applications blow up when they get EIO as well. However you can imagine > an application that might decide to retry if it hits an ETIMEDOUT, > while failing if it hits an EIO. What is the reason of introducing new error code for IO operations, which is not in the list of POSIX specified values for read(2) and write(2). Is there expected application behavior change compared to EAGAIN? I would like to use the opportunity to bring the topic of O_NONBLOCK open(2) flag for offline files. Tigran. > > Cheers > Trond