On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote: > Hi Trond, > > ----- Original Message ----- > > From: "Trond Myklebust" <trondmy@xxxxxxxxx> > > To: "Olga Kornievskaia" <aglo@xxxxxxxxx> > > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > > Sent: Tuesday, April 2, 2019 8:28:38 PM > > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x > > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote: > > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust < > > > trondmy@xxxxxxxxx> > > > wrote: > > > > This patchset aims to make soft mounts a viable option for > > > > NFSv4 > > > > clients > > > > by minimising the risk of false positive timeouts, while > > > > allowing > > > > for > > > > faster failover of reads and writes once a timeout is actually > > > > observed. > > > > > > > > The patches rely on the NFS server correctly implementing the > > > > contract > > > > specified in RFC7530 section 3.1.1 with respect to not dropping > > > > requests > > > > while the transport connection is up. When this is the case, > > > > the > > > > client > > > > can safely assume that if the request has not received a reply > > > > after > > > > transmitting a RPC request, it is not because the request was > > > > dropped, > > > > but rather is due to congestion, or slow processing on the > > > > server. > > > > IOW: as long as the connection remains up, there is no need for > > > > requests > > > > to time out. > > > > > > > > The patches break down roughly as follows: > > > > - A set of patches to clean up the RPC engine timeouts, and > > > > ensure > > > > they > > > > are accurate. > > > > - A set of patches to change the 'soft' mount semantics for > > > > NFSv4.x. > > > > - A set of patches to add a new 'softerr' mount option that > > > > works > > > > like > > > > soft, but explicitly signals timeouts using the ETIMEDOUT > > > > error > > > > code > > > > rather than using EIO. This allows applications to tune their > > > > behaviour (e.g. by failing over to a different server) if a > > > > timeout > > > > occurs. > > > > > > I'm just curious why would an application be aware of a different > > > server to connect to and an NFS layer would not be? I'm also > > > curious > > > wouldn't it break application that typically expect to get an EIO > > > errors? Do all system calls allow to return ETIMEDOUT error? > > > > This is why it is a separate mount option. ...and actually most > > applications blow up when they get EIO as well. However you can > > imagine > > an application that might decide to retry if it hits an ETIMEDOUT, > > while failing if it hits an EIO. > > What is the reason of introducing new error code for IO operations, > which > is not in the list of POSIX specified values for read(2) and > write(2). Is > there expected application behavior change compared to EAGAIN? The point is to allow aware applications to better handle a situation which is not covered by POSIX because POSIX has no concept of storage that is temporarily unavailable. ...and it is being proposed as an opt-in feature, precisely so that existing applications don't need to change. > I would like to use the opportunity to bring the topic of O_NONBLOCK > open(2) > flag for offline files. -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space