RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

<tao.peng@xxxxxxx> · Thu, 8 Sep 2011 06:21:11 -0400

Hi, Trond,

Sorry for the late response.

> -----Original Message-----
> From: Trond Myklebust [mailto:Trond.Myklebust@xxxxxxxxxx]
> Sent: Wednesday, September 07, 2011 6:33 AM
> To: Vitaliy Gusev
> Cc: Vitaliy Gusev; Peng, Tao; linux-nfs@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
> 
> On Wed, 2011-09-07 at 02:13 +0400, Vitaliy Gusev wrote:
> > >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode,
> struct list_head *listp)
> > >>
> > >>   	list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) {
> > >>   		if (lseg->pls_range.iomode == IOMODE_RW&&
> > >> -		    test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
> > >> +		    test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&&
> > >> +		    list_empty(&lseg->pls_lc_list))
> > >>   			list_add(&lseg->pls_lc_list, listp);
> > >>   	}
> > >>   }
> > >
> > > If the lseg is already part of one layoutcommit, but we're sending a
> > > second one for the same range (presumably because we wrote more data in
> > > the same region), then the above causes the lseg to be excluded.
> >
> >
> > Yes, lseg is excluded, This patch does exactly only exclusion of lseg.
> > lseg is used here only to get/put reference on this lseg, so skipping is
> > correct.
> 
> Are you sure? As far as I can see, pnfs_list_write_lseg() actually
> assigns the lseg to a particular layoutcommit call.
> 
> My questions are: if I write to an area described by that lseg _after_
> it has been assigned to that layoutcommit RPC call (and the latter has
> already been dispatched to the metadata server), then
>      A. shouldn't it be assigned to a second layoutcommit call too?
>      B. If not, what are we doing to ensure mutual exclusion between
>         layoutcommit RPC calls and pNFS writes to the data server?
I think it depends on the purpose of layoutcommit.
1. for updating atime/mtime. In this case, we don't need mutual exclusion
between layoutcommit RPC and pNFS writes to data server.
2. for updating LD specific state. In this case, LD should decide what to commit
(and actually ignores the range of lseg). Take block layout for example. It maintains
blocksized extent state inside LD and determines what to encode inside layoutcommit
RPC whenever there is a generic layer layoutcommit call. So when an extent is being
layoutcommitted, client can still write to the same range, as long as the lseg is held by client.

I am not familiar enough with file/object layout case though. Does current implementation
break any requirements for file/object layout?

Thanks,
Tao
> 
> > However, checking on list_empty can occur (on other CPU) in the middle:
> >
> > 	list_del_init(&lseg->pls_lc_list);
> > Here >>>>>>
> > 	if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
> > 			       &lseg->pls_flags))
> > 		put_lseg(lseg);
> >
> >
> > So list_del_init must be executed under the same lock as
> > pnfs_list_write_lseg, i.e. inode->i_lock.
> 
> Agreed if my questions above make no sense.
> 
> Cheers
>   Trond
> 
> --
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@xxxxxxxxxx
> www.netapp.com
> 

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥