Re: 4.1 client - LAYOUTCOMMIT & close

Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> · Wed, 07 Jul 2010 09:18:16 -0400

On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
> > > On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx wrote: 
> > >> The COMMIT to the DS, ttbomk, commits data on the DS.  I see it as
> > >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
> > >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
> > >> point, so even if the non-clustered server does not want to update
> > >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
> > >> execute whatever synchronization mechanism the implementer wishes to put
> > >> in the control protocol.
> > > 
> > > As far as I'm aware, there are no exceptions in RFC5661 that would allow
> > > pNFS servers to break the rule that any visible change to the data must
> > > be atomically accompanied with a change attribute update.
> > > 
> > 
> > Trond, I'm not sure how this rule you mentioned is specified.
> > 
> > See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
> > in particular:
> > 
> >    For some layout protocols, the storage device is able to notify the
> >    metadata server of the occurrence of an I/O; as a result, the change
> >    and time_modify attributes may be updated at the metadata server.
> >    For a metadata server that is capable of monitoring updates to the
> >    change and time_modify attributes, LAYOUTCOMMIT processing is not
> >    required to update the change attribute.  In this case, the metadata
> >    server must ensure that no further update to the data has occurred
> >    since the last update of the attributes; file-based protocols may
> >    have enough information to make this determination or may update the
> >    change attribute upon each file modification.  This also applies for
> >    the time_modify attribute.  If the server implementation is able to
> >    determine that the file has not been modified since the last
> >    time_modify update, the server need not update time_modify at
> >    LAYOUTCOMMIT.  At LAYOUTCOMMIT completion, the updated attributes
> >    should be visible if that file was modified since the latest previous
> >    LAYOUTCOMMIT or LAYOUTGET
> 
> I know. However the above paragraph does not state that the server
> should make those changes visible to clients other than the one that is
> writing.
> 
> Section 18.32.4 states that writes will cause the time_modified and
> change attributes to be updated (if and only if the file data is
> modified). Several other sections rely on this behaviour, including
> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> 
> The only 'special behaviour' that I see allowed for pNFS is in section
> 13.10, which states that clients can't expect to see changes
> immediately, but that they must be able to expect close-to-open
> semantics to work. Again, if this is to be the case, then the server
> _must_ be able to deal with the case where client 1 dies before it can
> issue the LAYOUTCOMMIT.
> 
> 
> > > As I see it, if your server allows one client to read data that may have
> > > been modified by another client that holds a WRITE layout for that range
> > > then (since that is a visible data change) it should provide a change
> > > attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> > > sent.
> > 
> > the requirement for the server in WRITE's implementation section 
> > is quite weak: "It is assumed that the act of writing data to a file will
> > cause the time_modified and change attributes of the file to be updated."
> > 
> > The difference here is that for pNFS the written data is not guaranteed
> > to be visible until LAYOUTCOMMIT.  In a broader sense, assuming the clients
> > are caching dirty data and use a write-behind cache, application-written data
> > may be visible to other processes on the same host but not to others until
> > fsync() or close() - open-to-close semantics are the only thing the client
> > guarantees, right?  Issuing LAYOUTCOMMIT on fsync() and close() ensure the
> > data is committed to stable storage and is visible to all other clients in
> > the cluster.
> 
> See above. I'm not disputing your statement that 'the written data is
> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
> assumption that 'the written data may be visible without an accompanying
> change attribute update'.

In other words, I'd expect the following scenario to give the same
results in NFSv4.1 w/pNFS as it does in NFSv4:

Client 1			Client 2
========			========

OPEN foo
READ
CLOSE
				OPEN
				LAYOUTGET ...
				WRITE via DS
				<dies>...
OPEN foo
verify change_attr
READ if above WRITE is visible
CLOSE

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html