Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 21 May 2024 15:24:19 +0000

> On May 21, 2024, at 11:13 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote:
>> 
>> 
>> On 21/05/2024 16:22, Jeff Layton wrote:
>>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote:
>>>> There is an inherent race where a symlink file may have been
>>>> overriden
>>>> (by a different client) between lookup and readlink, resulting in
>>>> a
>>>> spurious EIO error returned to userspace. Fix this by propagating
>>>> back
>>>> ESTALE errors such that the vfs will retry the lookup/get_link
>>>> (similar
>>>> to nfs4_file_open) at least once.
>>>> 
>>>> Cc: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
>>>> Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
>>>> ---
>>>> Note that with this change the vfs should retry once for
>>>> ESTALE errors. However with an artificial reproducer of high
>>>> frequency symlink overrides, nothing prevents the retry to
>>>> also encounter ESTALE, propagating the error back to userspace.
>>>> The man pages for openat/readlinkat do not list an ESTALE errno.
>>>> 
>>>> An alternative attempt (implemented by Dan) was a local retry
>>>> loop
>>>> in nfs_get_link(), if this is an applicable approach, Dan can
>>>> share his patch instead.
>>>> 
>>>>   fs/nfs/symlink.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
>>>> index 0e27a2e4e68b..13818129d268 100644
>>>> --- a/fs/nfs/symlink.c
>>>> +++ b/fs/nfs/symlink.c
>>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file
>>>> *file, struct folio *folio)
>>>>   error:
>>>>    folio_set_error(folio);
>>>>    folio_unlock(folio);
>>>> - return -EIO;
>>>> + return error;
>>>>   }
>>>>   
>>>>   static const char *nfs_get_link(struct dentry *dentry,
>>> git blame seems to indicate that we've returned -EIO here since the
>>> beginning of the git era (and likely long before that). I see no
>>> reason
>>> for us to cloak the real error there though, especially with
>>> something
>>> like an ESTALE error.
>>> 
>>>      Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
>>> 
>>> FWIW, I think we shouldn't try to do any retry looping on ESTALE
>>> beyond
>>> what we already do.
>>> 
>>> Yes, we can sometimes trigger ESTALE errors to bubble up to
>>> userland if
>>> we really thrash the underlying filesystem when testing, but I
>>> think
>>> that's actually desirable:
>> 
>> Returning ESTALE would be an improvement over returning EIO IMO,
>> but it may be surprising for userspace to see an undocumented errno.
>> Maybe the man pages can be amended?
>> 
>>> 
>>> If you have real workloads across multiple machines that are racing
>>> with other that tightly, then you should probably be using some
>>> sort of
>>> locking or other synchronization. If it's clever enough that it
>>> doesn''t need that, then it should be able to deal with the
>>> occasional
>>> ESTALE error by retrying on its own.
>> 
>> I tend to agree. FWIW Solaris has a config knob for number of stale
>> retries
>> it does, maybe there is an appetite to have something like that as
>> well?
>> 
> 
> Any reason why we couldn't just return ENOENT in the case where the
> filehandle is stale? There will have been an unlink() on the symlink at
> some point in the recent past.

To me ENOENT is preferable to both EIO and ESTALE.

--
Chuck Lever