> On May 21, 2024, at 11:13 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: >> >> >> On 21/05/2024 16:22, Jeff Layton wrote: >>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: >>>> There is an inherent race where a symlink file may have been >>>> overriden >>>> (by a different client) between lookup and readlink, resulting in >>>> a >>>> spurious EIO error returned to userspace. Fix this by propagating >>>> back >>>> ESTALE errors such that the vfs will retry the lookup/get_link >>>> (similar >>>> to nfs4_file_open) at least once. >>>> >>>> Cc: Dan Aloni <dan.aloni@xxxxxxxxxxxx> >>>> Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx> >>>> --- >>>> Note that with this change the vfs should retry once for >>>> ESTALE errors. However with an artificial reproducer of high >>>> frequency symlink overrides, nothing prevents the retry to >>>> also encounter ESTALE, propagating the error back to userspace. >>>> The man pages for openat/readlinkat do not list an ESTALE errno. >>>> >>>> An alternative attempt (implemented by Dan) was a local retry >>>> loop >>>> in nfs_get_link(), if this is an applicable approach, Dan can >>>> share his patch instead. >>>> >>>> fs/nfs/symlink.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c >>>> index 0e27a2e4e68b..13818129d268 100644 >>>> --- a/fs/nfs/symlink.c >>>> +++ b/fs/nfs/symlink.c >>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file >>>> *file, struct folio *folio) >>>> error: >>>> folio_set_error(folio); >>>> folio_unlock(folio); >>>> - return -EIO; >>>> + return error; >>>> } >>>> >>>> static const char *nfs_get_link(struct dentry *dentry, >>> git blame seems to indicate that we've returned -EIO here since the >>> beginning of the git era (and likely long before that). I see no >>> reason >>> for us to cloak the real error there though, especially with >>> something >>> like an ESTALE error. >>> >>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> >>> >>> FWIW, I think we shouldn't try to do any retry looping on ESTALE >>> beyond >>> what we already do. >>> >>> Yes, we can sometimes trigger ESTALE errors to bubble up to >>> userland if >>> we really thrash the underlying filesystem when testing, but I >>> think >>> that's actually desirable: >> >> Returning ESTALE would be an improvement over returning EIO IMO, >> but it may be surprising for userspace to see an undocumented errno. >> Maybe the man pages can be amended? >> >>> >>> If you have real workloads across multiple machines that are racing >>> with other that tightly, then you should probably be using some >>> sort of >>> locking or other synchronization. If it's clever enough that it >>> doesn''t need that, then it should be able to deal with the >>> occasional >>> ESTALE error by retrying on its own. >> >> I tend to agree. FWIW Solaris has a config knob for number of stale >> retries >> it does, maybe there is an appetite to have something like that as >> well? >> > > Any reason why we couldn't just return ENOENT in the case where the > filehandle is stale? There will have been an unlink() on the symlink at > some point in the recent past. To me ENOENT is preferable to both EIO and ESTALE. -- Chuck Lever