On Fri, 2011-09-16 at 07:58 -0400, tao.peng@xxxxxxx wrote: > Hi, Trond, > > > -----Original Message----- > > From: Trond Myklebust [mailto:Trond.Myklebust@xxxxxxxxxx] > > Sent: Friday, September 16, 2011 1:48 AM > > To: Peng Tao > > Cc: bhalevy@xxxxxxxxxx; Vitaliy Gusev; linux-nfs@xxxxxxxxxxxxxxx; Peng, Tao > > Subject: Re: [PATCH] nfs4: serialize layoutcommit > > > > On Thu, 2011-09-15 at 13:38 -0400, Trond Myklebust wrote: > > > On Sun, 2011-09-11 at 21:48 -0700, Peng Tao wrote: > > > > Current pnfs_layoutcommit_inode can not handle parallel layoutcommit. > > > > As Trond suggested, there is no need for client to optimize for > > > > parallel layoutcommit. The patch add NFS_INO_LAYOUTCOMMITTING flag to > > > > mark inflight layoutcommit and serialize lalyoutcommit with it. > > > > It also fixes the pls_lc_list corruption that Vitaliy found. > > > > > > > > Reported-by: Vitaliy Gusev <gusev.vitaliy@xxxxxxxxxxx> > > > > Signed-off-by: Peng Tao <peng_tao@xxxxxxx> > > > > --- > > > > fs/nfs/nfs4proc.c | 6 ++++++ > > > > fs/nfs/pnfs.c | 9 +++++++++ > > > > include/linux/nfs_fs.h | 1 + > > > > 3 files changed, 16 insertions(+), 0 deletions(-) > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > > index 4700fae..a7ce210 100644 > > > > --- a/fs/nfs/nfs4proc.c > > > > +++ b/fs/nfs/nfs4proc.c > > > > @@ -5970,6 +5970,7 @@ static void nfs4_layoutcommit_release(void *calldata) > > > > { > > > > struct nfs4_layoutcommit_data *data = calldata; > > > > struct pnfs_layout_segment *lseg, *tmp; > > > > + unsigned long *bitlock = &NFS_I(data->args.inode)->flags; > > > > > > > > pnfs_cleanup_layoutcommit(data); > > > > /* Matched by references in pnfs_set_layoutcommit */ > > > > @@ -5979,6 +5980,11 @@ static void nfs4_layoutcommit_release(void > > *calldata) > > > > &lseg->pls_flags)) > > > > put_lseg(lseg); > > > > } > > > > + > > > > + clear_bit_unlock(NFS_INO_LAYOUTCOMMITTING, bitlock); > > > > + smp_mb__after_clear_bit(); > > > > + wake_up_bit(bitlock, NFS_INO_LAYOUTCOMMITTING); > > > > + > > > > put_rpccred(data->cred); > > > > kfree(data); > > > > } > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > > > index b483bbc..fb71def 100644 > > > > --- a/fs/nfs/pnfs.c > > > > +++ b/fs/nfs/pnfs.c > > > > @@ -1451,10 +1451,19 @@ pnfs_layoutcommit_inode(struct inode *inode, > > bool sync) > > > > goto out; > > > > } > > > > > > > > + if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags) || > > > > + (status = wait_on_bit_lock(&nfsi->flags, > > NFS_INO_LAYOUTCOMMITTING, > > > > + nfs_wait_bit_killable, TASK_KILLABLE))) { > > > > > > You want the sleeping behaviour above to be subject to the 'sync' flag > > > (in the same way we do for regular commit). If !sync, then try to grab > > > the bit lock anyway, and exit on failure. > I'm a little confused about the "!sync" here. If we return failure, what do we expect callers to do? > Looking at the calling stack, I don't see any users retrying on -EAGAIN. Does the "!sync" necessarily mean non-blocking? > > > > > By the way. Shouldn't pnfs_layoutcommit_inode() also be marking the > > inode as dirty on failure? > Yeah, right, we should mark inode as dirty on failure. The !sync means that we're being called from a context such as kswapd where we really don't want to sleep or block the thread. In the case where this happens, and the NFS_INO_LAYOUTCOMMITTING bit can't be set, we should exit and mark the inode as dirty so that we can retry at a later time. Please see the 'commit' code, which has the same properties... Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html