Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler

Dan Aloni <dan.aloni@xxxxxxxxxxxx> · Tue, 21 May 2024 17:36:38 +0300

On 2024-05-21 09:22:46, Jeff Layton wrote:
> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote:
> > There is an inherent race where a symlink file may have been overriden
> > (by a different client) between lookup and readlink, resulting in a
> > spurious EIO error returned to userspace. Fix this by propagating back
> > ESTALE errors such that the vfs will retry the lookup/get_link (similar
> > to nfs4_file_open) at least once.
[..]
> FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond
> what we already do.
> 
> Yes, we can sometimes trigger ESTALE errors to bubble up to userland if
> we really thrash the underlying filesystem when testing, but I think
> that's actually desirable:
> 
> If you have real workloads across multiple machines that are racing
> with other that tightly, then you should probably be using some sort of
> locking or other synchronization. If it's clever enough that it
> doesn''t need that, then it should be able to deal with the occasional
> ESTALE error by retrying on its own.

We saw an issue in a real workload employing the method I describe in
the following test case, where the user was surprised getting an error.

It's a symlink atomicity semantics case, where there's a symlink that is
frequently being overridden using a rename. This use case works well
with local file systems and with some other network file systems
implementations (this was also noticeable as other options where
tested).

There is fixed set of regular files `test_file_{1,2,3}`, and a 'shunt'
symlink that keeps getting recreated to one of them:

    while true; do
        i=1;
	while (( i <= 3 )) ; do
	    ln -s -f test_file_$i shunt
	    i=$((i + 1))
	done
    done

Behind the scenes `ln` creates a symlink with a random name, then
performs the `rename` system call to override `shunt`. When another
client is trying to call `open` via the symlink so it would necessarily
resolve to one of the regular files client-side. The previous FH of `shunt`
becomes stale and sometimes fails this test.

It is why we saw a retry loop being worthwhile to implement, whether
inside the NFS client or outside in the VFS layer, the justification for
it was to prevent the workload from breaking when moving between file
systems.

-- 
Dan Aloni