RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

<Daniel.Muntz@xxxxxxx> · Wed, 7 Jul 2010 16:39:42 -0400

To bring this discussion full circle, since we agree that a compliant
server can implement a scheme where written data does not become visible
until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
"MUST" from a compliant client (independent of layout type)?

  -Dan

> -----Original Message-----
> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] 
> On Behalf Of Trond Myklebust
> Sent: Wednesday, July 07, 2010 7:04 AM
> To: Benny Halevy
> Cc: andros@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; Garth 
> Gibson; Brent Welch; NFSv4
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> 
> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust 
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust 
> <trond.myklebust@xxxxxxxxxx> wrote:
> > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx wrote: 
> > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. 
>  I see it as
> > >>>>> orthogonal to updating the metadata on the MDS (but 
> perhaps I'm wrong).
> > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT 
> provides a synchronization
> > >>>>> point, so even if the non-clustered server does not 
> want to update
> > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also 
> be a trigger to
> > >>>>> execute whatever synchronization mechanism the 
> implementer wishes to put
> > >>>>> in the control protocol.
> > >>>>
> > >>>> As far as I'm aware, there are no exceptions in 
> RFC5661 that would allow
> > >>>> pNFS servers to break the rule that any visible change 
> to the data must
> > >>>> be atomically accompanied with a change attribute update.
> > >>>>
> > >>>
> > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > >>>
> > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT 
> and change/time_modify
> > >>> in particular:
> > >>>
> > >>>    For some layout protocols, the storage device is 
> able to notify the
> > >>>    metadata server of the occurrence of an I/O; as a 
> result, the change
> > >>>    and time_modify attributes may be updated at the 
> metadata server.
> > >>>    For a metadata server that is capable of monitoring 
> updates to the
> > >>>    change and time_modify attributes, LAYOUTCOMMIT 
> processing is not
> > >>>    required to update the change attribute.  In this 
> case, the metadata
> > >>>    server must ensure that no further update to the 
> data has occurred
> > >>>    since the last update of the attributes; file-based 
> protocols may
> > >>>    have enough information to make this determination 
> or may update the
> > >>>    change attribute upon each file modification.  This 
> also applies for
> > >>>    the time_modify attribute.  If the server 
> implementation is able to
> > >>>    determine that the file has not been modified since the last
> > >>>    time_modify update, the server need not update time_modify at
> > >>>    LAYOUTCOMMIT.  At LAYOUTCOMMIT completion, the 
> updated attributes
> > >>>    should be visible if that file was modified since 
> the latest previous
> > >>>    LAYOUTCOMMIT or LAYOUTGET
> > >>
> > >> I know. However the above paragraph does not state that 
> the server
> > >> should make those changes visible to clients other than 
> the one that is
> > >> writing.
> > >>
> > >> Section 18.32.4 states that writes will cause the 
> time_modified and
> > >> change attributes to be updated (if and only if the file data is
> > >> modified). Several other sections rely on this 
> behaviour, including
> > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > >>
> > >> The only 'special behaviour' that I see allowed for pNFS 
> is in section
> > >> 13.10, which states that clients can't expect to see changes
> > >> immediately, but that they must be able to expect close-to-open
> > >> semantics to work. Again, if this is to be the case, 
> then the server
> > >> _must_ be able to deal with the case where client 1 dies 
> before it can
> > >> issue the LAYOUTCOMMIT.
> > 
> > Agreed.
> > 
> > >>
> > >>
> > >>>> As I see it, if your server allows one client to read 
> data that may have
> > >>>> been modified by another client that holds a WRITE 
> layout for that range
> > >>>> then (since that is a visible data change) it should 
> provide a change
> > >>>> attribute update irrespective of whether or not a 
> LAYOUTCOMMIT has been
> > >>>> sent.
> > >>>
> > >>> the requirement for the server in WRITE's 
> implementation section 
> > >>> is quite weak: "It is assumed that the act of writing 
> data to a file will
> > >>> cause the time_modified and change attributes of the 
> file to be updated."
> > >>>
> > >>> The difference here is that for pNFS the written data 
> is not guaranteed
> > >>> to be visible until LAYOUTCOMMIT.  In a broader sense, 
> assuming the clients
> > >>> are caching dirty data and use a write-behind cache, 
> application-written data
> > >>> may be visible to other processes on the same host but 
> not to others until
> > >>> fsync() or close() - open-to-close semantics are the 
> only thing the client
> > >>> guarantees, right?  Issuing LAYOUTCOMMIT on fsync() and 
> close() ensure the
> > >>> data is committed to stable storage and is visible to 
> all other clients in
> > >>> the cluster.
> > >>
> > >> See above. I'm not disputing your statement that 'the 
> written data is
> > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am 
> disputing an
> > >> assumption that 'the written data may be visible without 
> an accompanying
> > >> change attribute update'.
> > > 
> > > 
> > > In other words, I'd expect the following scenario to give the same
> > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > 
> > That's a strong requirement that may limit the scalability 
> of the server.
> > 
> > The spirit of the pNFS operations, at least from Panasas 
> perspective was that
> > the data is transient until LAYOUTCOMMIT, meaning it may or 
> may not be visible
> > to clients other than the one who wrote it, and its 
> associated metadata MUST
> > be updated and describe the new data only on LAYOUTCOMMIT 
> and until then it's
> > undefined, i.e. it's up to the server implementation 
> whether to update it or not.
> > 
> > Without locking, what do the stronger semantics buy you?
> > Even if a client verified the change_attribute new data may 
> become visible
> > at any time after the GETATTR if the file/byte range aren't locked.
> 
> There is no locking needed in the scenario below: it is ordinary
> close-to-open semantics.
> 
> The point is that if you remove the one and only way that clients have
> to determine whether or not their data caches are valid, then they can
> no longer cache data at all, and server scalability will be shot to
> smithereens anyway.
> 
> Trond
> 
> > Benny
> > 
> > > 
> > > Client 1			Client 2
> > > ========			========
> > > 
> > > OPEN foo
> > > READ
> > > CLOSE
> > > 				OPEN
> > > 				LAYOUTGET ...
> > > 				WRITE via DS
> > > 				<dies>...
> > > OPEN foo
> > > verify change_attr
> > > READ if above WRITE is visible
> > > CLOSE
> > > 
> > > Trond
> > > _______________________________________________
> > > nfsv4 mailing list
> > > nfsv4@xxxxxxxx
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@xxxxxxxx
> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html