On Tue, 2019-03-12 at 20:04 +0000, Schumaker, Anna wrote: > Hi Trond, > > I'm seeing a hang when testing xfstests generic/013 on v4.1 with pNFS > after this > patch: > > On Wed, 2018-09-05 at 14:07 -0400, Trond Myklebust wrote: > > If someone interrupts a wait on one or more outstanding layoutgets > > in > > pnfs_update_layout() then return the ERESTARTSYS/EINTR error. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > --- > > fs/nfs/pnfs.c | 26 ++++++++++++++++---------- > > 1 file changed, 16 insertions(+), 10 deletions(-) > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > index e8f232de484f..7d9a51e6b847 100644 > > --- a/fs/nfs/pnfs.c > > +++ b/fs/nfs/pnfs.c > > @@ -1740,16 +1740,16 @@ static bool pnfs_within_mdsthreshold(struct > > nfs_open_context *ctx, > > return ret; > > } > > > > -static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr > > *lo) > > +static int pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr > > *lo) > > { > > /* > > * send layoutcommit as it can hold up layoutreturn due to lseg > > * reference > > */ > > pnfs_layoutcommit_inode(lo->plh_inode, false); > > - return !wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN, > > + return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN, > > nfs_wait_bit_killable, > > - TASK_UNINTERRUPTIBLE); > > + TASK_KILLABLE); > > } > > > > static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo) > > @@ -1830,7 +1830,9 @@ pnfs_update_layout(struct inode *ino, > > } > > > > lookup_again: > > - nfs4_client_recover_expired_lease(clp); > > + lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp)); > > + if (IS_ERR(lseg)) > > + goto out; > > first = false; > > spin_lock(&ino->i_lock); > > lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags); > > @@ -1863,9 +1865,9 @@ pnfs_update_layout(struct inode *ino, > > if (list_empty(&lo->plh_segs) && > > atomic_read(&lo->plh_outstanding) != 0) { > > spin_unlock(&ino->i_lock); > > - if (wait_var_event_killable(&lo->plh_outstanding, > > - atomic_read(&lo- > > >plh_outstanding) == 0 > > - || !list_empty(&lo->plh_segs))) > > + lseg = ERR_PTR(wait_var_event_killable(&lo- > > >plh_outstanding, > > + atomic_read(&lo- > > >plh_outstanding))); > > + if (IS_ERR(lseg) || !list_empty(&lo->plh_segs)) > > Was dropping the "== 0" condition attached to the atomic_read() here > a mistake? > I think what's happening is that my client is waiting for > plh_outstanding to be > anything other than 0 when there isn't any work left to do. Yes. That's a bug. How about the following patch? 8<--------------------------------------------------- >From 400417b05f3ec0531544ca5f94e64d838d8b8849 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Tue, 12 Mar 2019 16:04:51 -0400 Subject: [PATCH] pNFS: Fix a typo in pnfs_update_layout We're supposed to wait for the outstanding layout count to go to zero, but that got lost somehow. Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...") Reported-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> --- fs/nfs/pnfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 8247bd1634cb..7066cd7c7aff 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1889,7 +1889,7 @@ pnfs_update_layout(struct inode *ino, atomic_read(&lo->plh_outstanding) != 0) { spin_unlock(&ino->i_lock); lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding, - atomic_read(&lo->plh_outstanding))); + !atomic_read(&lo->plh_outstanding))); if (IS_ERR(lseg) || !list_empty(&lo->plh_segs)) goto out_put_layout_hdr; pnfs_put_layout_hdr(lo); -- 2.20.1 -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx