On 21/05/2024 17:02, Chuck Lever wrote:
On Tue, May 21, 2024 at 03:58:40PM +0300, Sagi Grimberg wrote:
There is an inherent race where a symlink file may have been overriden
Nit: Do you mean "overwritten" ?
Yes.
(by a different client) between lookup and readlink, resulting in a
spurious EIO error returned to userspace. Fix this by propagating back
ESTALE errors such that the vfs will retry the lookup/get_link (similar
to nfs4_file_open) at least once.
Cc: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
---
Note that with this change the vfs should retry once for
ESTALE errors. However with an artificial reproducer of high
frequency symlink overrides, nothing prevents the retry to
Nit: "overwrites" ?
Yes.
also encounter ESTALE, propagating the error back to userspace.
The man pages for openat/readlinkat do not list an ESTALE errno.
Speaking only as a community member, I consider that an undesirable
behavior regression. IMO it's a bug for a system call to return an
errno that isn't documented. That's likely why this logic has worked
this way for forever.
Well, if this is an issue, it would be paired with a vfs change that
checks for the error-code of the retry and convert it back.
Something like:
--
diff --git a/fs/namei.c b/fs/namei.c
index ceb9ddf8dfdd..9a06408bb52f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3812,6 +3812,8 @@ static struct file *path_openat(struct nameidata *nd,
else
error = -ESTALE;
}
+ if (error == -ESTALE && (flags & LOOKUP_REVAL))
+ error = -EIO;
return ERR_PTR(error);
}
--
But we'd need to check with Al for this type of a change...
An alternative attempt (implemented by Dan) was a local retry loop
in nfs_get_link(), if this is an applicable approach, Dan can
share his patch instead.
I'm not entirely convinced by your patch description that returning
an EIO on occasion is a problem. Is it reasonable for the app to
expect that readlinkat() will /never/ fail?
Maybe not never, but its fairly easy to encounter, and it was definitely
observed in the wild.
Making symlink semantics more atomic on NFS mounts is probably a
good goal. But IMO the proposed change by itself isn't going to get
you that with high reliability and few or no undesirable side
effects.
What undesirable effects?
Note that NFS client-side patches should be sent To: Trond, Anna,
and Cc: linux-nfs@ . Trond and Anna need to weigh in on this.
Yes, it was sent to linux-nfs so I expect Trond and Anna to see it, you
and Jeff were CC'd because we briefly discussed about this last week at
LSFMM.