RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

Trond Myklebust <trond.myklebust@xxxxxxxxxx> · Wed, 07 Jul 2010 19:14:57 -0400



On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
> > On Wed, 2010-07-07 at 18:44 -0400, david.black@xxxxxxx wrote:
> > > Let me try this ...
> > > 
> > > A correct client will always send LAYOUTCOMMIT.
> > > Assume that the client is correct.
> > > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
> > > 
> > > Important implication: No LAYOUTCOMMIT is an error/failure case.  It
> > > just has to work; it doesn't have to be fast.
> > > 
> > > Suggestion: If a client dies while holding writeable layouts that permit
> > > write-in-place, and the client doesn't reappear or doesn't reclaim those
> > > layouts, then the server should assume that the files involved were
> > > written before the client died, and set the file attributes accordingly
> > > as part of internally reclaiming the layout that the client has
> > > abandoned.
> > > 
> > > Caveat: It may take a while for the server to determine that the client
> > > has abandoned a layout.
> > > 
> > > This can result in false positives (file appears to be modified when it
> > > wasn't) but won't yield false negatives (file does not appear to be
> > > modified even though it was modified).
> > 
> > OK... So we're going to have to turn off client side file caching
> > entirely for pNFS? I can do that...
> > 
> > The above won't work. Think readahead...
> 
> So... What can work, is if you modify it to work explicitly for
> close-to-open
> 
> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
> check that it has received LAYOUTCOMMITs from any other clients that may
> have the file open for writing. If it hasn't, then it MUST take some
> action to ensure that any file data changes are accompanied by a change
                           ^ potentially visible
> attribute update."
> 
> Then you can add the above suggestion without the offending caveat. Note
> however that it does break the "SHOULD NOT" admonition in section
> 18.32.4.
> 
> Trond
> 
> 
> > Trond
> > 
> > > Thanks,
> > > --David
> > > 
> > > > -----Original Message-----
> > > > From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf
> > > Of Noveck_David@xxxxxxx
> > > > Sent: Wednesday, July 07, 2010 6:04 PM
> > > > To: Trond.Myklebust@xxxxxxxxxx; Muntz, Daniel
> > > > Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx;
> > > nfsv4@xxxxxxxx;
> > > > andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx
> > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > > 
> > > > > Yes. I would agree that the client cannot rely on the updates being
> > > made
> > > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply
> > > that a
> > > > > compliant server MUST also have a valid strategy for dealing with
> > > the
> > > > > case where the client doesn't send it.
> > > > 
> > > > So you are saying the updates "MUST be made visible" through the
> > > > server's valid strategy.  Is that right.
> > > > 
> > > > And that the client cannot rely on that.  Why not, if the server must
> > > > have a valid strategy.
> > > > 
> > > > Is this just prudent "belt and suspenders" design or what?
> > > > 
> > > > It seems to me that if one side here is MUST (and the spec needs to be
> > > > clearer about what might or might not constitute a valid strategy),
> > > then
> > > > the other side should be SHOULD.
> > > > 
> > > > If both sides are "MUST", then if things don't work out then the
> > > client
> > > > and server can equally point to one another and say "It's his fault".
> > > > 
> > > > Am I missing something here?
> > > > 
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf
> > > > Of Trond Myklebust
> > > > Sent: Wednesday, July 07, 2010 5:01 PM
> > > > To: Muntz, Daniel
> > > > Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx;
> > > > nfsv4@xxxxxxxx; andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx
> > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > > 
> > > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@xxxxxxx wrote:
> > > > > To bring this discussion full circle, since we agree that a
> > > compliant
> > > > > server can implement a scheme where written data does not become
> > > > visible
> > > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > > > > "MUST" from a compliant client (independent of layout type)?
> > > > 
> > > > Yes. I would agree that the client cannot rely on the updates being
> > > made
> > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> > > a
> > > > compliant server MUST also have a valid strategy for dealing with the
> > > > case where the client doesn't send it.
> > > > 
> > > > Cheers
> > > >   Trond
> > > > 
> > > > >   -Dan
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx]
> > > > > > On Behalf Of Trond Myklebust
> > > > > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > > > > To: Benny Halevy
> > > > > > Cc: andros@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; Garth
> > > > > > Gibson; Brent Welch; NFSv4
> > > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > > > >
> > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > > > > <Trond.Myklebust@xxxxxxxxxx> wrote:
> > > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > > > > <trond.myklebust@xxxxxxxxxx> wrote:
> > > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx
> > > > wrote:
> > > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
> > > see it as
> > > > > > > >>>>> orthogonal to updating the metadata on the MDS (but
> > > perhaps I'm wrong).
> > > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> > > synchronization
> > > > > > > >>>>> point, so even if the non-clustered server does not want
> > > to update
> > > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> > > trigger to
> > > > > > > >>>>> execute whatever synchronization mechanism the implementer
> > > wishes to put
> > > > > > > >>>>> in the control protocol.
> > > > > > > >>>>
> > > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
> > > that would allow
> > > > > > > >>>> pNFS servers to break the rule that any visible change to
> > > the data must
> > > > > > > >>>> be atomically accompanied with a change attribute update.
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Trond, I'm not sure how this rule you mentioned is
> > > specified.
> > > > > > > >>>
> > > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
> > > change/time_modify
> > > > > > > >>> in particular:
> > > > > > > >>>
> > > > > > > >>>    For some layout protocols, the storage device is able to
> > > notify the
> > > > > > > >>>    metadata server of the occurrence of an I/O; as a result,
> > > the change
> > > > > > > >>>    and time_modify attributes may be updated at the metadata
> > > server.
> > > > > > > >>>    For a metadata server that is capable of monitoring
> > > updates to the
> > > > > > > >>>    change and time_modify attributes, LAYOUTCOMMIT
> > > processing is not
> > > > > > > >>>    required to update the change attribute.  In this case,
> > > the metadata
> > > > > > > >>>    server must ensure that no further update to the data has
> > > occurred
> > > > > > > >>>    since the last update of the attributes; file-based
> > > protocols may
> > > > > > > >>>    have enough information to make this determination or may
> > > update the
> > > > > > > >>>    change attribute upon each file modification.  This also
> > > applies for
> > > > > > > >>>    the time_modify attribute.  If the server implementation
> > > is able to
> > > > > > > >>>    determine that the file has not been modified since the
> > > last
> > > > > > > >>>    time_modify update, the server need not update
> > > time_modify at
> > > > > > > >>>    LAYOUTCOMMIT.  At LAYOUTCOMMIT completion, the updated
> > > attributes
> > > > > > > >>>    should be visible if that file was modified since the
> > > latest previous
> > > > > > > >>>    LAYOUTCOMMIT or LAYOUTGET
> > > > > > > >>
> > > > > > > >> I know. However the above paragraph does not state that the
> > > server
> > > > > > > >> should make those changes visible to clients other than the
> > > one that is
> > > > > > > >> writing.
> > > > > > > >>
> > > > > > > >> Section 18.32.4 states that writes will cause the
> > > time_modified and
> > > > > > > >> change attributes to be updated (if and only if the file data
> > > is
> > > > > > > >> modified). Several other sections rely on this behaviour,
> > > including
> > > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > > > > >>
> > > > > > > >> The only 'special behaviour' that I see allowed for pNFS is
> > > in section
> > > > > > > >> 13.10, which states that clients can't expect to see changes
> > > > > > > >> immediately, but that they must be able to expect
> > > close-to-open
> > > > > > > >> semantics to work. Again, if this is to be the case, then the
> > > server
> > > > > > > >> _must_ be able to deal with the case where client 1 dies
> > > before it can
> > > > > > > >> issue the LAYOUTCOMMIT.
> > > > > > >
> > > > > > > Agreed.
> > > > > > >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>>> As I see it, if your server allows one client to read data
> > > that may have
> > > > > > > >>>> been modified by another client that holds a WRITE layout
> > > for that range
> > > > > > > >>>> then (since that is a visible data change) it should
> > > provide a change
> > > > > > > >>>> attribute update irrespective of whether or not a
> > > LAYOUTCOMMIT has been
> > > > > > > >>>> sent.
> > > > > > > >>>
> > > > > > > >>> the requirement for the server in WRITE's implementation
> > > section
> > > > > > > >>> is quite weak: "It is assumed that the act of writing data
> > > to a file will
> > > > > > > >>> cause the time_modified and change attributes of the file to
> > > be updated."
> > > > > > > >>>
> > > > > > > >>> The difference here is that for pNFS the written data is not
> > > guaranteed
> > > > > > > >>> to be visible until LAYOUTCOMMIT.  In a broader sense,
> > > assuming the clients
> > > > > > > >>> are caching dirty data and use a write-behind cache,
> > > application-written data
> > > > > > > >>> may be visible to other processes on the same host but not
> > > to others until
> > > > > > > >>> fsync() or close() - open-to-close semantics are the only
> > > thing the client
> > > > > > > >>> guarantees, right?  Issuing LAYOUTCOMMIT on fsync() and
> > > close() ensure the
> > > > > > > >>> data is committed to stable storage and is visible to all
> > > other clients in
> > > > > > > >>> the cluster.
> > > > > > > >>
> > > > > > > >> See above. I'm not disputing your statement that 'the written
> > > data is
> > > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > > disputing an
> > > > > > > >> assumption that 'the written data may be visible without an
> > > accompanying
> > > > > > > >> change attribute update'.
> > > > > > > >
> > > > > > > >
> > > > > > > > In other words, I'd expect the following scenario to give the
> > > same
> > > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > > > > >
> > > > > > > That's a strong requirement that may limit the scalability of
> > > the server.
> > > > > > >
> > > > > > > The spirit of the pNFS operations, at least from Panasas
> > > perspective was that
> > > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
> > > not be visible
> > > > > > > to clients other than the one who wrote it, and its associated
> > > metadata MUST
> > > > > > > be updated and describe the new data only on LAYOUTCOMMIT and
> > > until then it's
> > > > > > > undefined, i.e. it's up to the server implementation whether to
> > > update it or not.
> > > > > > >
> > > > > > > Without locking, what do the stronger semantics buy you?
> > > > > > > Even if a client verified the change_attribute new data may
> > > become visible
> > > > > > > at any time after the GETATTR if the file/byte range aren't
> > > locked.
> > > > > >
> > > > > > There is no locking needed in the scenario below: it is ordinary
> > > > > > close-to-open semantics.
> > > > > >
> > > > > > The point is that if you remove the one and only way that clients
> > > have
> > > > > > to determine whether or not their data caches are valid, then they
> > > can
> > > > > > no longer cache data at all, and server scalability will be shot
> > > to
> > > > > > smithereens anyway.
> > > > > >
> > > > > > Trond
> > > > > >
> > > > > > > Benny
> > > > > > >
> > > > > > > >
> > > > > > > > Client 1			Client 2
> > > > > > > > ========			========
> > > > > > > >
> > > > > > > > OPEN foo
> > > > > > > > READ
> > > > > > > > CLOSE
> > > > > > > > 				OPEN
> > > > > > > > 				LAYOUTGET ...
> > > > > > > > 				WRITE via DS
> > > > > > > > 				<dies>...
> > > > > > > > OPEN foo
> > > > > > > > verify change_attr
> > > > > > > > READ if above WRITE is visible
> > > > > > > > CLOSE
> > > > > > > >
> > > > > > > > Trond
> > > > > > > > _______________________________________________
> > > > > > > > nfsv4 mailing list
> > > > > > > > nfsv4@xxxxxxxx
> > > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > nfsv4 mailing list
> > > > > > nfsv4@xxxxxxxx
> > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > > >
> > > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > nfsv4@xxxxxxxx
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > 
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > nfsv4@xxxxxxxx
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html