It seems like we agree that laycommit will be sent for file layout, correct? Or Should I file a defect on this? For reference my original email below. // START In certain cases, I don't see layoutcommit on a file at all even after doing many writes. Client side operations: open write(s) close On server side (observed operations): open layoutget's close But, I do not see laycommit at all. In terms data written by client it is about 4-5MB. When does client issue laycommit? // END Regards, Sandeep -----Original Message----- From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] On Behalf Of Trond Myklebust Sent: Thursday, July 08, 2010 2:16 PM To: david.black@xxxxxxx Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx; nfsv4@xxxxxxxx; andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close On Thu, 2010-07-08 at 16:30 -0400, david.black@xxxxxxx wrote: > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the > > client hasn't written to the file. I'm not sure what about the > > blocks case though, do you implicitly free up any provisionally > > allocated blocks that the client had not explicitly committed using LAYOUTCOMMIT? > > In principle, yes as the blocks are no longer promised to the client, > although lazy evaluation of this is an obvious optimization. > > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server > > >> must check that it has received LAYOUTCOMMITs from any other > > >> clients that may have the file open for writing. If it hasn't, > > >> then it MUST take some action to ensure that any file data > > >> changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the > > change, mtime, or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending > > >> caveat. Note however that it does break the "SHOULD NOT" > > >> admonition in section 18.32.4. > > > > Better be safe than sorry in this rare error case. > > I concur with Benny on both of the above - in essence, the unrecovered client failure is a reason to potentially ignore the "SHOULD" (server can't know whether it actually ignored the "SHOULD", hence better safe than sorry). We probably ought to find a someplace appropriate to add a paragraph or two explaining this in one of the 4.2 documents. Right. I'm only interested in fixing the close-to-open case. The case of general GETATTR calls might be nice to fix too, but it should not be essential in order to ensure that well-behaved applications continue to work as expected. Note, however, that legacy support for stateless protocols like NFSv2 and NFSv3 may be problematic: there is no equivalent of OPEN, and so the server may have to do the above check on all NFSPROC2_GETATTR, NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests. Trond > Thanks, > --David > > > > -----Original Message----- > > From: Benny Halevy [mailto:bhalevy.lists@xxxxxxxxx] On Behalf Of > > Benny Halevy > > Sent: Thursday, July 08, 2010 12:00 PM > > To: Trond Myklebust > > Cc: Black, David; Noveck, David; Muntz, Daniel; > > linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; welch@xxxxxxxxxxx; > > nfsv4@xxxxxxxx; andros@xxxxxxxxxx > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote: > > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > > >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@xxxxxxx wrote: > > >>>> Let me try this ... > > >>>> > > >>>> A correct client will always send LAYOUTCOMMIT. > > >>>> Assume that the client is correct. > > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > >>>> > > >>>> Important implication: No LAYOUTCOMMIT is an error/failure > > >>>> case. It just has to work; it doesn't have to be fast. > > >>>> > > > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the > > client hasn't written to the file. I'm not sure what about the > > blocks case though, do you implicitly free up any provisionally > > allocated blocks that the client had not explicitly committed using LAYOUTCOMMIT? > > > > >>>> Suggestion: If a client dies while holding writeable layouts > > >>>> that permit write-in-place, and the client doesn't reappear or > > >>>> doesn't reclaim those layouts, then the server should assume > > >>>> that the files involved were written before the client died, > > >>>> and set the file attributes accordingly as part of internally > > >>>> reclaiming the layout that the client has abandoned. > > > > Of course. That's part of the server recovery. > > > > >>>> > > >>>> Caveat: It may take a while for the server to determine that > > >>>> the client has abandoned a layout. > > > > That's two lease times after a respective CB_LAYOUTRECALL. > > > > >>>> > > >>>> This can result in false positives (file appears to be modified > > >>>> when it > > >>>> wasn't) but won't yield false negatives (file does not appear > > >>>> to be modified even though it was modified). > > >>> > > >>> OK... So we're going to have to turn off client side file > > >>> caching entirely for pNFS? I can do that... > > >>> > > >>> The above won't work. Think readahead... > > >> > > >> So... What can work, is if you modify it to work explicitly for > > >> close-to-open > > >> > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server > > >> must check that it has received LAYOUTCOMMITs from any other > > >> clients that may have the file open for writing. If it hasn't, > > >> then it MUST take some action to ensure that any file data > > >> changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the > > change, mtime, or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending > > >> caveat. Note however that it does break the "SHOULD NOT" > > >> admonition in section 18.32.4. > > > > Better be safe than sorry in this rare error case. > > > > Benny > > > > >> > > >> Trond > > >> > > >> > > >>> Trond > > >>> > > >>>> Thanks, > > >>>> --David > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] > > >>>>> On Behalf > > >>>> Of Noveck_David@xxxxxxx > > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM > > >>>>> To: Trond.Myklebust@xxxxxxxxxx; Muntz, Daniel > > >>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; > > >>>>> welch@xxxxxxxxxxx; > > >>>> nfsv4@xxxxxxxx; > > >>>>> andros@xxxxxxxxxx; bhalevy@xxxxxxxxxxx > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>>> Yes. I would agree that the client cannot rely on the updates > > >>>>>> being > > >>>> made > > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was > > >>>>>> simply > > >>>> that a > > >>>>>> compliant server MUST also have a valid strategy for dealing > > >>>>>> with > > >>>> the > > >>>>>> case where the client doesn't send it. > > >>>>> > > >>>>> So you are saying the updates "MUST be made visible" through > > >>>>> the server's valid strategy. Is that right. > > >>>>> > > >>>>> And that the client cannot rely on that. Why not, if the > > >>>>> server must have a valid strategy. > > >>>>> > > >>>>> Is this just prudent "belt and suspenders" design or what? > > >>>>> > > >>>>> It seems to me that if one side here is MUST (and the spec > > >>>>> needs to be clearer about what might or might not constitute a > > >>>>> valid strategy), > > >>>> then > > >>>>> the other side should be SHOULD. > > >>>>> > > >>>>> If both sides are "MUST", then if things don't work out then > > >>>>> the > > >>>> client > > >>>>> and server can equally point to one another and say "It's his fault". > > >>>>> > > >>>>> Am I missing something here? > > >>>>> > > >>>>> > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] > > >>>>> On Behalf Of Trond Myklebust > > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM > > >>>>> To: Muntz, Daniel > > >>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx; garth@xxxxxxxxxxx; > > >>>>> welch@xxxxxxxxxxx; nfsv4@xxxxxxxx; andros@xxxxxxxxxx; > > >>>>> bhalevy@xxxxxxxxxxx > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@xxxxxxx wrote: > > >>>>>> To bring this discussion full circle, since we agree that a > > >>>> compliant > > >>>>>> server can implement a scheme where written data does not > > >>>>>> become > > >>>>> visible > > >>>>>> until after a LAYOUTCOMMIT, do we also agree that > > >>>>>> LAYOUTCOMMIT is a "MUST" from a compliant client (independent of layout type)? > > >>>>> > > >>>>> Yes. I would agree that the client cannot rely on the updates > > >>>>> being > > >>>> made > > >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was > > >>>>> simply that > > >>>> a > > >>>>> compliant server MUST also have a valid strategy for dealing > > >>>>> with the case where the client doesn't send it. > > >>>>> > > >>>>> Cheers > > >>>>> Trond > > >>>>> > > >>>>>> -Dan > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: nfsv4-bounces@xxxxxxxx [mailto:nfsv4-bounces@xxxxxxxx] > > >>>>>>> On Behalf Of Trond Myklebust > > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM > > >>>>>>> To: Benny Halevy > > >>>>>>> Cc: andros@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; Garth > > >>>>>>> Gibson; Brent Welch; NFSv4 > > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>>>> > > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > >>>>>>> <Trond.Myklebust@xxxxxxxxxx> wrote: > > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > >>>>>>> <trond.myklebust@xxxxxxxxxx> wrote: > > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@xxxxxxx > > >>>>> wrote: > > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > >>>>>>>>>>>>> I > > >>>> see it as > > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but > > >>>> perhaps I'm wrong). > > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides > > >>>>>>>>>>>>> a > > >>>> synchronization > > >>>>>>>>>>>>> point, so even if the non-clustered server does not > > >>>>>>>>>>>>> want > > >>>> to update > > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > >>>>>>>>>>>>> be a > > >>>> trigger to > > >>>>>>>>>>>>> execute whatever synchronization mechanism the > > >>>>>>>>>>>>> implementer > > >>>> wishes to put > > >>>>>>>>>>>>> in the control protocol. > > >>>>>>>>>>>> > > >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 > > >>>> that would allow > > >>>>>>>>>>>> pNFS servers to break the rule that any visible change > > >>>>>>>>>>>> to > > >>>> the data must > > >>>>>>>>>>>> be atomically accompanied with a change attribute update. > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is > > >>>> specified. > > >>>>>>>>>>> > > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > >>>>>>>>>>> and > > >>>> change/time_modify > > >>>>>>>>>>> in particular: > > >>>>>>>>>>> > > >>>>>>>>>>> For some layout protocols, the storage device is able > > >>>>>>>>>>> to > > >>>> notify the > > >>>>>>>>>>> metadata server of the occurrence of an I/O; as a > > >>>>>>>>>>> result, > > >>>> the change > > >>>>>>>>>>> and time_modify attributes may be updated at the > > >>>>>>>>>>> metadata > > >>>> server. > > >>>>>>>>>>> For a metadata server that is capable of monitoring > > >>>> updates to the > > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT > > >>>> processing is not > > >>>>>>>>>>> required to update the change attribute. In this > > >>>>>>>>>>> case, > > >>>> the metadata > > >>>>>>>>>>> server must ensure that no further update to the data > > >>>>>>>>>>> has > > >>>> occurred > > >>>>>>>>>>> since the last update of the attributes; file-based > > >>>> protocols may > > >>>>>>>>>>> have enough information to make this determination or > > >>>>>>>>>>> may > > >>>> update the > > >>>>>>>>>>> change attribute upon each file modification. This > > >>>>>>>>>>> also > > >>>> applies for > > >>>>>>>>>>> the time_modify attribute. If the server > > >>>>>>>>>>> implementation > > >>>> is able to > > >>>>>>>>>>> determine that the file has not been modified since > > >>>>>>>>>>> the > > >>>> last > > >>>>>>>>>>> time_modify update, the server need not update > > >>>> time_modify at > > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > >>>>>>>>>>> updated > > >>>> attributes > > >>>>>>>>>>> should be visible if that file was modified since the > > >>>> latest previous > > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET > > >>>>>>>>>> > > >>>>>>>>>> I know. However the above paragraph does not state that > > >>>>>>>>>> the > > >>>> server > > >>>>>>>>>> should make those changes visible to clients other than > > >>>>>>>>>> the > > >>>> one that is > > >>>>>>>>>> writing. > > >>>>>>>>>> > > >>>>>>>>>> Section 18.32.4 states that writes will cause the > > >>>> time_modified and > > >>>>>>>>>> change attributes to be updated (if and only if the file > > >>>>>>>>>> data > > >>>> is > > >>>>>>>>>> modified). Several other sections rely on this behaviour, > > >>>> including > > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > >>>>>>>>>> > > >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS > > >>>>>>>>>> is > > >>>> in section > > >>>>>>>>>> 13.10, which states that clients can't expect to see > > >>>>>>>>>> changes immediately, but that they must be able to expect > > >>>> close-to-open > > >>>>>>>>>> semantics to work. Again, if this is to be the case, then > > >>>>>>>>>> the > > >>>> server > > >>>>>>>>>> _must_ be able to deal with the case where client 1 dies > > >>>> before it can > > >>>>>>>>>> issue the LAYOUTCOMMIT. > > >>>>>>>> > > >>>>>>>> Agreed. > > >>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>> As I see it, if your server allows one client to read > > >>>>>>>>>>>> data > > >>>> that may have > > >>>>>>>>>>>> been modified by another client that holds a WRITE > > >>>>>>>>>>>> layout > > >>>> for that range > > >>>>>>>>>>>> then (since that is a visible data change) it should > > >>>> provide a change > > >>>>>>>>>>>> attribute update irrespective of whether or not a > > >>>> LAYOUTCOMMIT has been > > >>>>>>>>>>>> sent. > > >>>>>>>>>>> > > >>>>>>>>>>> the requirement for the server in WRITE's implementation > > >>>> section > > >>>>>>>>>>> is quite weak: "It is assumed that the act of writing > > >>>>>>>>>>> data > > >>>> to a file will > > >>>>>>>>>>> cause the time_modified and change attributes of the > > >>>>>>>>>>> file to > > >>>> be updated." > > >>>>>>>>>>> > > >>>>>>>>>>> The difference here is that for pNFS the written data is > > >>>>>>>>>>> not > > >>>> guaranteed > > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, > > >>>> assuming the clients > > >>>>>>>>>>> are caching dirty data and use a write-behind cache, > > >>>> application-written data > > >>>>>>>>>>> may be visible to other processes on the same host but > > >>>>>>>>>>> not > > >>>> to others until > > >>>>>>>>>>> fsync() or close() - open-to-close semantics are the > > >>>>>>>>>>> only > > >>>> thing the client > > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > >>>> close() ensure the > > >>>>>>>>>>> data is committed to stable storage and is visible to > > >>>>>>>>>>> all > > >>>> other clients in > > >>>>>>>>>>> the cluster. > > >>>>>>>>>> > > >>>>>>>>>> See above. I'm not disputing your statement that 'the > > >>>>>>>>>> written > > >>>> data is > > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > >>>> disputing an > > >>>>>>>>>> assumption that 'the written data may be visible without > > >>>>>>>>>> an > > >>>> accompanying > > >>>>>>>>>> change attribute update'. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> In other words, I'd expect the following scenario to give > > >>>>>>>>> the > > >>>> same > > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: > > >>>>>>>> > > >>>>>>>> That's a strong requirement that may limit the scalability > > >>>>>>>> of > > >>>> the server. > > >>>>>>>> > > >>>>>>>> The spirit of the pNFS operations, at least from Panasas > > >>>> perspective was that > > >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or > > >>>>>>>> may > > >>>> not be visible > > >>>>>>>> to clients other than the one who wrote it, and its > > >>>>>>>> associated > > >>>> metadata MUST > > >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT > > >>>>>>>> and > > >>>> until then it's > > >>>>>>>> undefined, i.e. it's up to the server implementation > > >>>>>>>> whether to > > >>>> update it or not. > > >>>>>>>> > > >>>>>>>> Without locking, what do the stronger semantics buy you? > > >>>>>>>> Even if a client verified the change_attribute new data may > > >>>> become visible > > >>>>>>>> at any time after the GETATTR if the file/byte range aren't > > >>>> locked. > > >>>>>>> > > >>>>>>> There is no locking needed in the scenario below: it is > > >>>>>>> ordinary close-to-open semantics. > > >>>>>>> > > >>>>>>> The point is that if you remove the one and only way that > > >>>>>>> clients > > >>>> have > > >>>>>>> to determine whether or not their data caches are valid, > > >>>>>>> then they > > >>>> can > > >>>>>>> no longer cache data at all, and server scalability will be > > >>>>>>> shot > > >>>> to > > >>>>>>> smithereens anyway. > > >>>>>>> > > >>>>>>> Trond > > >>>>>>> > > >>>>>>>> Benny > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Client 1 Client 2 > > >>>>>>>>> ======== ======== > > >>>>>>>>> > > >>>>>>>>> OPEN foo > > >>>>>>>>> READ > > >>>>>>>>> CLOSE > > >>>>>>>>> OPEN > > >>>>>>>>> LAYOUTGET ... > > >>>>>>>>> WRITE via DS > > >>>>>>>>> <dies>... > > >>>>>>>>> OPEN foo > > >>>>>>>>> verify change_attr > > >>>>>>>>> READ if above WRITE is visible CLOSE > > >>>>>>>>> > > >>>>>>>>> Trond > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> nfsv4 mailing list > > >>>>>>>>> nfsv4@xxxxxxxx > > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> nfsv4 mailing list > > >>>>>>> nfsv4@xxxxxxxx > > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@xxxxxxxx > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@xxxxxxxx > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe > > >> linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > > >> More majordomo info at > > >> http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@xxxxxxxx > https://www.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@xxxxxxxx https://www.ietf.org/mailman/listinfo/nfsv4 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html