Re: Linux NFS client hangs in nfs4_lookup_revalidate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 27 May 2024, Richard Kojedzinszky wrote:
> Dear Neil,
> 
> I was running it on arm64, may that be the reason?

That could certainly explain it.  I know x86_64 almost never needs
barriers, though I have seen cases where they matter.  ppc64 is very
sensitive.  A quick search suggests that arm64 does need barriers some
times.
I don't have arm64 hardware to test on but I'm happy with your
test results.

Thanks,
NeilBrown


> 
> Regards,
> Richard
> 
> 
> On May 27, 2024 4:02:32 AM GMT+02:00, NeilBrown <neilb@xxxxxxx> wrote:
> 
>     On Sun, 26 May 2024, Richard Kojedzinszky wrote:
>         Dear Neil,
>         
>         According to my quick tests, your patch seems to fix this bug. Could you 
>         also manage to try my attached code, could you also reproduce the bug?
>     
>     Thanks for testing.
>     
>     I can run your test code but it isn't triggering the bug (90 minutes so
>     far).  Possibly a different compiler used for the kernel, possibly
>     hardware differences (I'm running under qemu).  Bugs related to barriers
>     (which this one seems to be) need just the right circumstances to
>     trigger so they can be hard to reproduce on a different system.
>     
>     I've made some cosmetic improvements to the patch and will post it to
>     the NFS maintainers.
>     
>     Thanks again,
>     NeilBrown
>     
>     
>         
>         Thanks,
>         Richard
>         
>         2024-05-24 07:29 időpontban Richard Kojedzinszky ezt írta:
>             Dear Neil,
>             
>             I've applied your patch, and since then there are no lockups. Before 
>             that my application reported a lockup in a minute or two, now it has 
>             been running for half an hour, and still running.
>             
>             Thanks,
>             Richard
>             
>             2024-05-24 01:31 időpontban NeilBrown ezt írta:
>                 On Fri, 24 May 2024, Richard Kojedzinszky wrote:
>                     Dear devs,
>                     
>                     I am attaching a stripped down version of the little program which
>                     triggers the bug very quickly, in a few minutes in my test lab. It
>                     turned out that a single NFS mountpoint is enough. Just start the
>                     program giving it the NFS mount as first argument. It will chdir 
>                     there,
>                     and do file operations, which will trigger a lockup in a few minutes.
>                 
>                 I couldn't get the go code to run.  But then it is a long time since I
>                 played with go and I didn't try very hard.
>                 If you could provide simple instructions and a list of package
>                 dependencies that I need to install (on Debian), I can give it a try.
>                 
>                 Or you could try this patch.  It might help, but I don't have high
>                 hopes.  It adds some memory barriers and fixes a bug which would cause 
>                 a
>                 problem if memory allocation failed (but memory allocation never 
>                 fails).
>                 
>                 NeilBrown
>                 
>                 diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>                 index ac505671efbd..5bcc0d14d519 100644
>                 --- a/fs/nfs/dir.c
>                 +++ b/fs/nfs/dir.c
>                 @@ -1804,7 +1804,7 @@ __nfs_lookup_revalidate(struct dentry *dentry, 
>                 unsigned int flags,
>                         } else {
>                                 /* Wait for unlink to complete */
>                                 wait_var_event(&dentry->d_fsdata,
>                 -                              dentry->d_fsdata != NFS_FSDATA_BLOCKED);
>                 +                              smp_load_acquire(&dentry->d_fsdata) != NFS_FSDATA_BLOCKED);
>                                 parent = dget_parent(dentry);
>                                 ret = reval(d_inode(parent), dentry, flags);
>                                 dput(parent);
>                 @@ -2508,7 +2508,7 @@ int nfs_unlink(struct inode *dir, struct dentry 
>                 *dentry)
>                         spin_unlock(&dentry->d_lock);
>                         error = nfs_safe_remove(dentry);
>                         nfs_dentry_remove_handle_error(dir, dentry, error);
>                 -       dentry->d_fsdata = NULL;
>                 +       smp_store_release(&dentry->d_fsdata, NULL);
>                         wake_up_var(&dentry->d_fsdata);
>                  out:
>                         trace_nfs_unlink_exit(dir, dentry, error);
>                 @@ -2616,7 +2616,7 @@ nfs_unblock_rename(struct rpc_task *task, struct 
>                 nfs_renamedata *data)
>                  {
>                         struct dentry *new_dentry = data->new_dentry;
>                 
>                 -       new_dentry->d_fsdata = NULL;
>                 +       smp_store_release(&new_dentry->d_fsdata, NULL);
>                         wake_up_var(&new_dentry->d_fsdata);
>                  }
>                 
>                 @@ -2717,6 +2717,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct 
>                 inode *old_dir,
>                         task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
>                                                 must_unblock ? nfs_unblock_rename : NULL);
>                         if (IS_ERR(task)) {
>                 +               if (must_unlock) {
>                 +                       smp_store_release(&new_dentry->d_fsdata, NULL);
>                 +                       wake_up_var(&new_dentry->d_fsdata);
>                 +               }
>                                 error = PTR_ERR(task);
>                                 goto out;
>                         }
>         
>     
>     
> 
> 
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux