On Wed, 2024-05-22 at 07:41 +0300, Dan Aloni wrote: > On 2024-05-21 15:24:19, Chuck Lever III wrote: > > > > > > > On May 21, 2024, at 11:13 AM, Trond Myklebust > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: > > > > > > > > > > > > On 21/05/2024 16:22, Jeff Layton wrote: > > > > > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > > > > > > There is an inherent race where a symlink file may have > > > > > > been > > > > > > overriden > > > > > > (by a different client) between lookup and readlink, > > > > > > resulting in > > > > > > a > > > > > > spurious EIO error returned to userspace. Fix this by > > > > > > propagating > > > > > > back > > > > > > ESTALE errors such that the vfs will retry the > > > > > > lookup/get_link > > > > > > (similar > > > > > > to nfs4_file_open) at least once. > > > > > > > > > > > > Cc: Dan Aloni <dan.aloni@xxxxxxxxxxxx> > > > > > > Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > > > > > > --- > > > > > > Note that with this change the vfs should retry once for > > > > > > ESTALE errors. However with an artificial reproducer of > > > > > > high > > > > > > frequency symlink overrides, nothing prevents the retry to > > > > > > also encounter ESTALE, propagating the error back to > > > > > > userspace. > > > > > > The man pages for openat/readlinkat do not list an ESTALE > > > > > > errno. > > > > > > > > > > > > An alternative attempt (implemented by Dan) was a local > > > > > > retry > > > > > > loop > > > > > > in nfs_get_link(), if this is an applicable approach, Dan > > > > > > can > > > > > > share his patch instead. > > > > > > > > > > > > fs/nfs/symlink.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > > > > > > index 0e27a2e4e68b..13818129d268 100644 > > > > > > --- a/fs/nfs/symlink.c > > > > > > +++ b/fs/nfs/symlink.c > > > > > > @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file > > > > > > *file, struct folio *folio) > > > > > > error: > > > > > > folio_set_error(folio); > > > > > > folio_unlock(folio); > > > > > > - return -EIO; > > > > > > + return error; > > > > > > } > > > > > > > > > > > > static const char *nfs_get_link(struct dentry *dentry, > > > > > git blame seems to indicate that we've returned -EIO here > > > > > since the > > > > > beginning of the git era (and likely long before that). I see > > > > > no > > > > > reason > > > > > for us to cloak the real error there though, especially with > > > > > something > > > > > like an ESTALE error. > > > > > > > > > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > > > FWIW, I think we shouldn't try to do any retry looping on > > > > > ESTALE > > > > > beyond > > > > > what we already do. > > > > > > > > > > Yes, we can sometimes trigger ESTALE errors to bubble up to > > > > > userland if > > > > > we really thrash the underlying filesystem when testing, but > > > > > I > > > > > think > > > > > that's actually desirable: > > > > > > > > Returning ESTALE would be an improvement over returning EIO > > > > IMO, > > > > but it may be surprising for userspace to see an undocumented > > > > errno. > > > > Maybe the man pages can be amended? > > > > > > > > > > > > > > If you have real workloads across multiple machines that are > > > > > racing > > > > > with other that tightly, then you should probably be using > > > > > some > > > > > sort of > > > > > locking or other synchronization. If it's clever enough that > > > > > it > > > > > doesn''t need that, then it should be able to deal with the > > > > > occasional > > > > > ESTALE error by retrying on its own. > > > > > > > > I tend to agree. FWIW Solaris has a config knob for number of > > > > stale > > > > retries > > > > it does, maybe there is an appetite to have something like that > > > > as > > > > well? > > > > > > > > > > Any reason why we couldn't just return ENOENT in the case where > > > the > > > filehandle is stale? There will have been an unlink() on the > > > symlink at > > > some point in the recent past. > > > > To me ENOENT is preferable to both EIO and ESTALE. > > Another view on that, where in the scenario of `rename` causing the > unlinking, there was no situation of 'no entry' as the directory > entry > was only updated and not removed. So ENOENT in this regard by the > meaning of 'no entry' would not reflect what has really happened. > > (unless you go with the 'no entity' interpretation of ENOENT, but > that > would be against most of the POSIX-spec cases where ENOENT is > returned > which deal primarily with missing path components i.e. names to > objects and not the objects themselves) > The Linux NFS client doesn't support volatile filehandles. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx