Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 21 May 2024 09:22:46 -0400

On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote:
> There is an inherent race where a symlink file may have been overriden
> (by a different client) between lookup and readlink, resulting in a
> spurious EIO error returned to userspace. Fix this by propagating back
> ESTALE errors such that the vfs will retry the lookup/get_link (similar
> to nfs4_file_open) at least once.
> 
> Cc: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
> Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
> ---
> Note that with this change the vfs should retry once for
> ESTALE errors. However with an artificial reproducer of high
> frequency symlink overrides, nothing prevents the retry to
> also encounter ESTALE, propagating the error back to userspace.
> The man pages for openat/readlinkat do not list an ESTALE errno.
> 
> An alternative attempt (implemented by Dan) was a local retry loop
> in nfs_get_link(), if this is an applicable approach, Dan can
> share his patch instead.
> 
>  fs/nfs/symlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
> index 0e27a2e4e68b..13818129d268 100644
> --- a/fs/nfs/symlink.c
> +++ b/fs/nfs/symlink.c
> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio)
>  error:
>  	folio_set_error(folio);
>  	folio_unlock(folio);
> -	return -EIO;
> +	return error;
>  }
>  
>  static const char *nfs_get_link(struct dentry *dentry,

git blame seems to indicate that we've returned -EIO here since the
beginning of the git era (and likely long before that). I see no reason
for us to cloak the real error there though, especially with something
like an ESTALE error.

    Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>

FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond
what we already do.

Yes, we can sometimes trigger ESTALE errors to bubble up to userland if
we really thrash the underlying filesystem when testing, but I think
that's actually desirable:

If you have real workloads across multiple machines that are racing
with other that tightly, then you should probably be using some sort of
locking or other synchronization. If it's clever enough that it
doesn''t need that, then it should be able to deal with the occasional
ESTALE error by retrying on its own.

Cheers,
Jeff