On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@xxxxxxx wrote: > To bring this discussion full circle, since we agree that a compliant > server can implement a scheme where written data does not become visible > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > "MUST" from a compliant client (independent of layout type)? Yes. I would agree that the client cannot rely on the updates being made visible if it fails to send the LAYOUTCOMMIT. My point was simply that a compliant server MUST also have a valid strategy for dealing with the case where the client doesn't send it. Cheers Trond > -Dan > > > -----Original Message----- > > From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] > > On Behalf Of Trond Myklebust > > Sent: Wednesday, July 07, 2010 7:04 AM > > To: Benny Halevy > > Cc: andros@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; Garth > > Gibson; Brent Welch; NFSv4 > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > <Trond.Myklebust@xxxxxxxxxx> wrote: > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > <trond.myklebust@xxxxxxxxxx> wrote: > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx wrote: > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > I see it as > > > >>>>> orthogonal to updating the metadata on the MDS (but > > perhaps I'm wrong). > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT > > provides a synchronization > > > >>>>> point, so even if the non-clustered server does not > > want to update > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > be a trigger to > > > >>>>> execute whatever synchronization mechanism the > > implementer wishes to put > > > >>>>> in the control protocol. > > > >>>> > > > >>>> As far as I'm aware, there are no exceptions in > > RFC5661 that would allow > > > >>>> pNFS servers to break the rule that any visible change > > to the data must > > > >>>> be atomically accompanied with a change attribute update. > > > >>>> > > > >>> > > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > > >>> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > and change/time_modify > > > >>> in particular: > > > >>> > > > >>> For some layout protocols, the storage device is > > able to notify the > > > >>> metadata server of the occurrence of an I/O; as a > > result, the change > > > >>> and time_modify attributes may be updated at the > > metadata server. > > > >>> For a metadata server that is capable of monitoring > > updates to the > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > processing is not > > > >>> required to update the change attribute. In this > > case, the metadata > > > >>> server must ensure that no further update to the > > data has occurred > > > >>> since the last update of the attributes; file-based > > protocols may > > > >>> have enough information to make this determination > > or may update the > > > >>> change attribute upon each file modification. This > > also applies for > > > >>> the time_modify attribute. If the server > > implementation is able to > > > >>> determine that the file has not been modified since the last > > > >>> time_modify update, the server need not update time_modify at > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > updated attributes > > > >>> should be visible if that file was modified since > > the latest previous > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > >> > > > >> I know. However the above paragraph does not state that > > the server > > > >> should make those changes visible to clients other than > > the one that is > > > >> writing. > > > >> > > > >> Section 18.32.4 states that writes will cause the > > time_modified and > > > >> change attributes to be updated (if and only if the file data is > > > >> modified). Several other sections rely on this > > behaviour, including > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > >> > > > >> The only 'special behaviour' that I see allowed for pNFS > > is in section > > > >> 13.10, which states that clients can't expect to see changes > > > >> immediately, but that they must be able to expect close-to-open > > > >> semantics to work. Again, if this is to be the case, > > then the server > > > >> _must_ be able to deal with the case where client 1 dies > > before it can > > > >> issue the LAYOUTCOMMIT. > > > > > > Agreed. > > > > > > >> > > > >> > > > >>>> As I see it, if your server allows one client to read > > data that may have > > > >>>> been modified by another client that holds a WRITE > > layout for that range > > > >>>> then (since that is a visible data change) it should > > provide a change > > > >>>> attribute update irrespective of whether or not a > > LAYOUTCOMMIT has been > > > >>>> sent. > > > >>> > > > >>> the requirement for the server in WRITE's > > implementation section > > > >>> is quite weak: "It is assumed that the act of writing > > data to a file will > > > >>> cause the time_modified and change attributes of the > > file to be updated." > > > >>> > > > >>> The difference here is that for pNFS the written data > > is not guaranteed > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > assuming the clients > > > >>> are caching dirty data and use a write-behind cache, > > application-written data > > > >>> may be visible to other processes on the same host but > > not to others until > > > >>> fsync() or close() - open-to-close semantics are the > > only thing the client > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > close() ensure the > > > >>> data is committed to stable storage and is visible to > > all other clients in > > > >>> the cluster. > > > >> > > > >> See above. I'm not disputing your statement that 'the > > written data is > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > disputing an > > > >> assumption that 'the written data may be visible without > > an accompanying > > > >> change attribute update'. > > > > > > > > > > > > In other words, I'd expect the following scenario to give the same > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > That's a strong requirement that may limit the scalability > > of the server. > > > > > > The spirit of the pNFS operations, at least from Panasas > > perspective was that > > > the data is transient until LAYOUTCOMMIT, meaning it may or > > may not be visible > > > to clients other than the one who wrote it, and its > > associated metadata MUST > > > be updated and describe the new data only on LAYOUTCOMMIT > > and until then it's > > > undefined, i.e. it's up to the server implementation > > whether to update it or not. > > > > > > Without locking, what do the stronger semantics buy you? > > > Even if a client verified the change_attribute new data may > > become visible > > > at any time after the GETATTR if the file/byte range aren't locked. > > > > There is no locking needed in the scenario below: it is ordinary > > close-to-open semantics. > > > > The point is that if you remove the one and only way that clients have > > to determine whether or not their data caches are valid, then they can > > no longer cache data at all, and server scalability will be shot to > > smithereens anyway. > > > > Trond > > > > > Benny > > > > > > > > > > > Client 1 Client 2 > > > > ======== ======== > > > > > > > > OPEN foo > > > > READ > > > > CLOSE > > > > OPEN > > > > LAYOUTGET ... > > > > WRITE via DS > > > > <dies>... > > > > OPEN foo > > > > verify change_attr > > > > READ if above WRITE is visible > > > > CLOSE > > > > > > > > Trond > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@xxxxxxxx > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@xxxxxxxx > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html