On Sun, 26 May 2024, Richard Kojedzinszky wrote: > Dear Neil, > > According to my quick tests, your patch seems to fix this bug. Could you > also manage to try my attached code, could you also reproduce the bug? Thanks for testing. I can run your test code but it isn't triggering the bug (90 minutes so far). Possibly a different compiler used for the kernel, possibly hardware differences (I'm running under qemu). Bugs related to barriers (which this one seems to be) need just the right circumstances to trigger so they can be hard to reproduce on a different system. I've made some cosmetic improvements to the patch and will post it to the NFS maintainers. Thanks again, NeilBrown > > Thanks, > Richard > > 2024-05-24 07:29 időpontban Richard Kojedzinszky ezt írta: > > Dear Neil, > > > > I've applied your patch, and since then there are no lockups. Before > > that my application reported a lockup in a minute or two, now it has > > been running for half an hour, and still running. > > > > Thanks, > > Richard > > > > 2024-05-24 01:31 időpontban NeilBrown ezt írta: > >> On Fri, 24 May 2024, Richard Kojedzinszky wrote: > >>> Dear devs, > >>> > >>> I am attaching a stripped down version of the little program which > >>> triggers the bug very quickly, in a few minutes in my test lab. It > >>> turned out that a single NFS mountpoint is enough. Just start the > >>> program giving it the NFS mount as first argument. It will chdir > >>> there, > >>> and do file operations, which will trigger a lockup in a few minutes. > >> > >> I couldn't get the go code to run. But then it is a long time since I > >> played with go and I didn't try very hard. > >> If you could provide simple instructions and a list of package > >> dependencies that I need to install (on Debian), I can give it a try. > >> > >> Or you could try this patch. It might help, but I don't have high > >> hopes. It adds some memory barriers and fixes a bug which would cause > >> a > >> problem if memory allocation failed (but memory allocation never > >> fails). > >> > >> NeilBrown > >> > >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > >> index ac505671efbd..5bcc0d14d519 100644 > >> --- a/fs/nfs/dir.c > >> +++ b/fs/nfs/dir.c > >> @@ -1804,7 +1804,7 @@ __nfs_lookup_revalidate(struct dentry *dentry, > >> unsigned int flags, > >> } else { > >> /* Wait for unlink to complete */ > >> wait_var_event(&dentry->d_fsdata, > >> - dentry->d_fsdata != NFS_FSDATA_BLOCKED); > >> + smp_load_acquire(&dentry->d_fsdata) != NFS_FSDATA_BLOCKED); > >> parent = dget_parent(dentry); > >> ret = reval(d_inode(parent), dentry, flags); > >> dput(parent); > >> @@ -2508,7 +2508,7 @@ int nfs_unlink(struct inode *dir, struct dentry > >> *dentry) > >> spin_unlock(&dentry->d_lock); > >> error = nfs_safe_remove(dentry); > >> nfs_dentry_remove_handle_error(dir, dentry, error); > >> - dentry->d_fsdata = NULL; > >> + smp_store_release(&dentry->d_fsdata, NULL); > >> wake_up_var(&dentry->d_fsdata); > >> out: > >> trace_nfs_unlink_exit(dir, dentry, error); > >> @@ -2616,7 +2616,7 @@ nfs_unblock_rename(struct rpc_task *task, struct > >> nfs_renamedata *data) > >> { > >> struct dentry *new_dentry = data->new_dentry; > >> > >> - new_dentry->d_fsdata = NULL; > >> + smp_store_release(&new_dentry->d_fsdata, NULL); > >> wake_up_var(&new_dentry->d_fsdata); > >> } > >> > >> @@ -2717,6 +2717,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct > >> inode *old_dir, > >> task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, > >> must_unblock ? nfs_unblock_rename : NULL); > >> if (IS_ERR(task)) { > >> + if (must_unlock) { > >> + smp_store_release(&new_dentry->d_fsdata, NULL); > >> + wake_up_var(&new_dentry->d_fsdata); > >> + } > >> error = PTR_ERR(task); > >> goto out; > >> } >