> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't > written to the file. I'm not sure what about the blocks case though, do you > implicitly free up any provisionally allocated blocks that the client had not > explicitly committed using LAYOUTCOMMIT? In principle, yes as the blocks are no longer promised to the client, although lazy evaluation of this is an obvious optimization. > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must > >> check that it has received LAYOUTCOMMITs from any other clients that may > >> have the file open for writing. If it hasn't, then it MUST take some > >> action to ensure that any file data changes are accompanied by a change > > ^ potentially visible > >> attribute update." > > That should be OK as long as it's not for every GETATTR for the change, mtime, > or size attributes. > > >> > >> Then you can add the above suggestion without the offending caveat. Note > >> however that it does break the "SHOULD NOT" admonition in section > >> 18.32.4. > > Better be safe than sorry in this rare error case. I concur with Benny on both of the above - in essence, the unrecovered client failure is a reason to potentially ignore the "SHOULD" (server can't know whether it actually ignored the "SHOULD", hence better safe than sorry). We probably ought to find a someplace appropriate to add a paragraph or two explaining this in one of the 4.2 documents. Thanks, --David > -----Original Message----- > From: Benny Halevy [mailto:bhalevy.lists@xxxxxxxxx] On Behalf Of Benny Halevy > Sent: Thursday, July 08, 2010 12:00 PM > To: Trond Myklebust > Cc: Black, David; Noveck, David; Muntz, Daniel; linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; > welch@xxxxxxxxxxx; nfsv4@xxxxxxxx; andros@xxxxxxxxxx > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote: > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@xxxxxxx wrote: > >>>> Let me try this ... > >>>> > >>>> A correct client will always send LAYOUTCOMMIT. > >>>> Assume that the client is correct. > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > >>>> > >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. It > >>>> just has to work; it doesn't have to be fast. > >>>> > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't > written to the file. I'm not sure what about the blocks case though, do you > implicitly free up any provisionally allocated blocks that the client had not > explicitly committed using LAYOUTCOMMIT? > > >>>> Suggestion: If a client dies while holding writeable layouts that permit > >>>> write-in-place, and the client doesn't reappear or doesn't reclaim those > >>>> layouts, then the server should assume that the files involved were > >>>> written before the client died, and set the file attributes accordingly > >>>> as part of internally reclaiming the layout that the client has > >>>> abandoned. > > Of course. That's part of the server recovery. > > >>>> > >>>> Caveat: It may take a while for the server to determine that the client > >>>> has abandoned a layout. > > That's two lease times after a respective CB_LAYOUTRECALL. > > >>>> > >>>> This can result in false positives (file appears to be modified when it > >>>> wasn't) but won't yield false negatives (file does not appear to be > >>>> modified even though it was modified). > >>> > >>> OK... So we're going to have to turn off client side file caching > >>> entirely for pNFS? I can do that... > >>> > >>> The above won't work. Think readahead... > >> > >> So... What can work, is if you modify it to work explicitly for > >> close-to-open > >> > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must > >> check that it has received LAYOUTCOMMITs from any other clients that may > >> have the file open for writing. If it hasn't, then it MUST take some > >> action to ensure that any file data changes are accompanied by a change > > ^ potentially visible > >> attribute update." > > That should be OK as long as it's not for every GETATTR for the change, mtime, > or size attributes. > > >> > >> Then you can add the above suggestion without the offending caveat. Note > >> however that it does break the "SHOULD NOT" admonition in section > >> 18.32.4. > > Better be safe than sorry in this rare error case. > > Benny > > >> > >> Trond > >> > >> > >>> Trond > >>> > >>>> Thanks, > >>>> --David > >>>> > >>>>> -----Original Message----- > >>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf > >>>> Of Noveck_David@xxxxxxx > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM > >>>>> To: Trond.Myklebust@xxxxxxxxxx; Muntz, Daniel > >>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx; > >>>> nfsv4@xxxxxxxx; > >>>>> andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > >>>>> > >>>>>> Yes. I would agree that the client cannot rely on the updates being > >>>> made > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply > >>>> that a > >>>>>> compliant server MUST also have a valid strategy for dealing with > >>>> the > >>>>>> case where the client doesn't send it. > >>>>> > >>>>> So you are saying the updates "MUST be made visible" through the > >>>>> server's valid strategy. Is that right. > >>>>> > >>>>> And that the client cannot rely on that. Why not, if the server must > >>>>> have a valid strategy. > >>>>> > >>>>> Is this just prudent "belt and suspenders" design or what? > >>>>> > >>>>> It seems to me that if one side here is MUST (and the spec needs to be > >>>>> clearer about what might or might not constitute a valid strategy), > >>>> then > >>>>> the other side should be SHOULD. > >>>>> > >>>>> If both sides are "MUST", then if things don't work out then the > >>>> client > >>>>> and server can equally point to one another and say "It's his fault". > >>>>> > >>>>> Am I missing something here? > >>>>> > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf > >>>>> Of Trond Myklebust > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM > >>>>> To: Muntz, Daniel > >>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx; > >>>>> nfsv4@xxxxxxxx; andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > >>>>> > >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@xxxxxxx wrote: > >>>>>> To bring this discussion full circle, since we agree that a > >>>> compliant > >>>>>> server can implement a scheme where written data does not become > >>>>> visible > >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > >>>>>> "MUST" from a compliant client (independent of layout type)? > >>>>> > >>>>> Yes. I would agree that the client cannot rely on the updates being > >>>> made > >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply that > >>>> a > >>>>> compliant server MUST also have a valid strategy for dealing with the > >>>>> case where the client doesn't send it. > >>>>> > >>>>> Cheers > >>>>> Trond > >>>>> > >>>>>> -Dan > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] > >>>>>>> On Behalf Of Trond Myklebust > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM > >>>>>>> To: Benny Halevy > >>>>>>> Cc: andros@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; Garth > >>>>>>> Gibson; Brent Welch; NFSv4 > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > >>>>>>> > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > >>>>>>> <Trond.Myklebust@xxxxxxxxxx> wrote: > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > >>>>>>> <trond.myklebust@xxxxxxxxxx> wrote: > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx > >>>>> wrote: > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I > >>>> see it as > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but > >>>> perhaps I'm wrong). > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > >>>> synchronization > >>>>>>>>>>>>> point, so even if the non-clustered server does not want > >>>> to update > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a > >>>> trigger to > >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer > >>>> wishes to put > >>>>>>>>>>>>> in the control protocol. > >>>>>>>>>>>> > >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 > >>>> that would allow > >>>>>>>>>>>> pNFS servers to break the rule that any visible change to > >>>> the data must > >>>>>>>>>>>> be atomically accompanied with a change attribute update. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is > >>>> specified. > >>>>>>>>>>> > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and > >>>> change/time_modify > >>>>>>>>>>> in particular: > >>>>>>>>>>> > >>>>>>>>>>> For some layout protocols, the storage device is able to > >>>> notify the > >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result, > >>>> the change > >>>>>>>>>>> and time_modify attributes may be updated at the metadata > >>>> server. > >>>>>>>>>>> For a metadata server that is capable of monitoring > >>>> updates to the > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT > >>>> processing is not > >>>>>>>>>>> required to update the change attribute. In this case, > >>>> the metadata > >>>>>>>>>>> server must ensure that no further update to the data has > >>>> occurred > >>>>>>>>>>> since the last update of the attributes; file-based > >>>> protocols may > >>>>>>>>>>> have enough information to make this determination or may > >>>> update the > >>>>>>>>>>> change attribute upon each file modification. This also > >>>> applies for > >>>>>>>>>>> the time_modify attribute. If the server implementation > >>>> is able to > >>>>>>>>>>> determine that the file has not been modified since the > >>>> last > >>>>>>>>>>> time_modify update, the server need not update > >>>> time_modify at > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated > >>>> attributes > >>>>>>>>>>> should be visible if that file was modified since the > >>>> latest previous > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET > >>>>>>>>>> > >>>>>>>>>> I know. However the above paragraph does not state that the > >>>> server > >>>>>>>>>> should make those changes visible to clients other than the > >>>> one that is > >>>>>>>>>> writing. > >>>>>>>>>> > >>>>>>>>>> Section 18.32.4 states that writes will cause the > >>>> time_modified and > >>>>>>>>>> change attributes to be updated (if and only if the file data > >>>> is > >>>>>>>>>> modified). Several other sections rely on this behaviour, > >>>> including > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. > >>>>>>>>>> > >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is > >>>> in section > >>>>>>>>>> 13.10, which states that clients can't expect to see changes > >>>>>>>>>> immediately, but that they must be able to expect > >>>> close-to-open > >>>>>>>>>> semantics to work. Again, if this is to be the case, then the > >>>> server > >>>>>>>>>> _must_ be able to deal with the case where client 1 dies > >>>> before it can > >>>>>>>>>> issue the LAYOUTCOMMIT. > >>>>>>>> > >>>>>>>> Agreed. > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> As I see it, if your server allows one client to read data > >>>> that may have > >>>>>>>>>>>> been modified by another client that holds a WRITE layout > >>>> for that range > >>>>>>>>>>>> then (since that is a visible data change) it should > >>>> provide a change > >>>>>>>>>>>> attribute update irrespective of whether or not a > >>>> LAYOUTCOMMIT has been > >>>>>>>>>>>> sent. > >>>>>>>>>>> > >>>>>>>>>>> the requirement for the server in WRITE's implementation > >>>> section > >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data > >>>> to a file will > >>>>>>>>>>> cause the time_modified and change attributes of the file to > >>>> be updated." > >>>>>>>>>>> > >>>>>>>>>>> The difference here is that for pNFS the written data is not > >>>> guaranteed > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, > >>>> assuming the clients > >>>>>>>>>>> are caching dirty data and use a write-behind cache, > >>>> application-written data > >>>>>>>>>>> may be visible to other processes on the same host but not > >>>> to others until > >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only > >>>> thing the client > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > >>>> close() ensure the > >>>>>>>>>>> data is committed to stable storage and is visible to all > >>>> other clients in > >>>>>>>>>>> the cluster. > >>>>>>>>>> > >>>>>>>>>> See above. I'm not disputing your statement that 'the written > >>>> data is > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am > >>>> disputing an > >>>>>>>>>> assumption that 'the written data may be visible without an > >>>> accompanying > >>>>>>>>>> change attribute update'. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> In other words, I'd expect the following scenario to give the > >>>> same > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: > >>>>>>>> > >>>>>>>> That's a strong requirement that may limit the scalability of > >>>> the server. > >>>>>>>> > >>>>>>>> The spirit of the pNFS operations, at least from Panasas > >>>> perspective was that > >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may > >>>> not be visible > >>>>>>>> to clients other than the one who wrote it, and its associated > >>>> metadata MUST > >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and > >>>> until then it's > >>>>>>>> undefined, i.e. it's up to the server implementation whether to > >>>> update it or not. > >>>>>>>> > >>>>>>>> Without locking, what do the stronger semantics buy you? > >>>>>>>> Even if a client verified the change_attribute new data may > >>>> become visible > >>>>>>>> at any time after the GETATTR if the file/byte range aren't > >>>> locked. > >>>>>>> > >>>>>>> There is no locking needed in the scenario below: it is ordinary > >>>>>>> close-to-open semantics. > >>>>>>> > >>>>>>> The point is that if you remove the one and only way that clients > >>>> have > >>>>>>> to determine whether or not their data caches are valid, then they > >>>> can > >>>>>>> no longer cache data at all, and server scalability will be shot > >>>> to > >>>>>>> smithereens anyway. > >>>>>>> > >>>>>>> Trond > >>>>>>> > >>>>>>>> Benny > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Client 1 Client 2 > >>>>>>>>> ======== ======== > >>>>>>>>> > >>>>>>>>> OPEN foo > >>>>>>>>> READ > >>>>>>>>> CLOSE > >>>>>>>>> OPEN > >>>>>>>>> LAYOUTGET ... > >>>>>>>>> WRITE via DS > >>>>>>>>> <dies>... > >>>>>>>>> OPEN foo > >>>>>>>>> verify change_attr > >>>>>>>>> READ if above WRITE is visible > >>>>>>>>> CLOSE > >>>>>>>>> > >>>>>>>>> Trond > >>>>>>>>> _______________________________________________ > >>>>>>>>> nfsv4 mailing list > >>>>>>>>> nfsv4@xxxxxxxx > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> nfsv4 mailing list > >>>>>>> nfsv4@xxxxxxxx > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> nfsv4 mailing list > >>>>> nfsv4@xxxxxxxx > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > >>>>> > >>>>> _______________________________________________ > >>>>> nfsv4 mailing list > >>>>> nfsv4@xxxxxxxx > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > >>>> > >>> > >>> > >> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥