On Mon, 27 May 2024, Richard Kojedzinszky wrote: > Dear Neil, > > I was running it on arm64, may that be the reason? That could certainly explain it. I know x86_64 almost never needs barriers, though I have seen cases where they matter. ppc64 is very sensitive. A quick search suggests that arm64 does need barriers some times. I don't have arm64 hardware to test on but I'm happy with your test results. Thanks, NeilBrown > > Regards, > Richard > > > On May 27, 2024 4:02:32 AM GMT+02:00, NeilBrown <neilb@xxxxxxx> wrote: > > On Sun, 26 May 2024, Richard Kojedzinszky wrote: > Dear Neil, > > According to my quick tests, your patch seems to fix this bug. Could you > also manage to try my attached code, could you also reproduce the bug? > > Thanks for testing. > > I can run your test code but it isn't triggering the bug (90 minutes so > far). Possibly a different compiler used for the kernel, possibly > hardware differences (I'm running under qemu). Bugs related to barriers > (which this one seems to be) need just the right circumstances to > trigger so they can be hard to reproduce on a different system. > > I've made some cosmetic improvements to the patch and will post it to > the NFS maintainers. > > Thanks again, > NeilBrown > > > > Thanks, > Richard > > 2024-05-24 07:29 időpontban Richard Kojedzinszky ezt írta: > Dear Neil, > > I've applied your patch, and since then there are no lockups. Before > that my application reported a lockup in a minute or two, now it has > been running for half an hour, and still running. > > Thanks, > Richard > > 2024-05-24 01:31 időpontban NeilBrown ezt írta: > On Fri, 24 May 2024, Richard Kojedzinszky wrote: > Dear devs, > > I am attaching a stripped down version of the little program which > triggers the bug very quickly, in a few minutes in my test lab. It > turned out that a single NFS mountpoint is enough. Just start the > program giving it the NFS mount as first argument. It will chdir > there, > and do file operations, which will trigger a lockup in a few minutes. > > I couldn't get the go code to run. But then it is a long time since I > played with go and I didn't try very hard. > If you could provide simple instructions and a list of package > dependencies that I need to install (on Debian), I can give it a try. > > Or you could try this patch. It might help, but I don't have high > hopes. It adds some memory barriers and fixes a bug which would cause > a > problem if memory allocation failed (but memory allocation never > fails). > > NeilBrown > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index ac505671efbd..5bcc0d14d519 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -1804,7 +1804,7 @@ __nfs_lookup_revalidate(struct dentry *dentry, > unsigned int flags, > } else { > /* Wait for unlink to complete */ > wait_var_event(&dentry->d_fsdata, > - dentry->d_fsdata != NFS_FSDATA_BLOCKED); > + smp_load_acquire(&dentry->d_fsdata) != NFS_FSDATA_BLOCKED); > parent = dget_parent(dentry); > ret = reval(d_inode(parent), dentry, flags); > dput(parent); > @@ -2508,7 +2508,7 @@ int nfs_unlink(struct inode *dir, struct dentry > *dentry) > spin_unlock(&dentry->d_lock); > error = nfs_safe_remove(dentry); > nfs_dentry_remove_handle_error(dir, dentry, error); > - dentry->d_fsdata = NULL; > + smp_store_release(&dentry->d_fsdata, NULL); > wake_up_var(&dentry->d_fsdata); > out: > trace_nfs_unlink_exit(dir, dentry, error); > @@ -2616,7 +2616,7 @@ nfs_unblock_rename(struct rpc_task *task, struct > nfs_renamedata *data) > { > struct dentry *new_dentry = data->new_dentry; > > - new_dentry->d_fsdata = NULL; > + smp_store_release(&new_dentry->d_fsdata, NULL); > wake_up_var(&new_dentry->d_fsdata); > } > > @@ -2717,6 +2717,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct > inode *old_dir, > task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, > must_unblock ? nfs_unblock_rename : NULL); > if (IS_ERR(task)) { > + if (must_unlock) { > + smp_store_release(&new_dentry->d_fsdata, NULL); > + wake_up_var(&new_dentry->d_fsdata); > + } > error = PTR_ERR(task); > goto out; > } > > > > > >