On 2024-05-21 09:22:46, Jeff Layton wrote: > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > > There is an inherent race where a symlink file may have been overriden > > (by a different client) between lookup and readlink, resulting in a > > spurious EIO error returned to userspace. Fix this by propagating back > > ESTALE errors such that the vfs will retry the lookup/get_link (similar > > to nfs4_file_open) at least once. [..] > FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond > what we already do. > > Yes, we can sometimes trigger ESTALE errors to bubble up to userland if > we really thrash the underlying filesystem when testing, but I think > that's actually desirable: > > If you have real workloads across multiple machines that are racing > with other that tightly, then you should probably be using some sort of > locking or other synchronization. If it's clever enough that it > doesn''t need that, then it should be able to deal with the occasional > ESTALE error by retrying on its own. We saw an issue in a real workload employing the method I describe in the following test case, where the user was surprised getting an error. It's a symlink atomicity semantics case, where there's a symlink that is frequently being overridden using a rename. This use case works well with local file systems and with some other network file systems implementations (this was also noticeable as other options where tested). There is fixed set of regular files `test_file_{1,2,3}`, and a 'shunt' symlink that keeps getting recreated to one of them: while true; do i=1; while (( i <= 3 )) ; do ln -s -f test_file_$i shunt i=$((i + 1)) done done Behind the scenes `ln` creates a symlink with a random name, then performs the `rename` system call to override `shunt`. When another client is trying to call `open` via the symlink so it would necessarily resolve to one of the regular files client-side. The previous FH of `shunt` becomes stale and sometimes fails this test. It is why we saw a retry loop being worthwhile to implement, whether inside the NFS client or outside in the VFS layer, the justification for it was to prevent the workload from breaking when moving between file systems. -- Dan Aloni