On Tue, 2022-08-30 at 11:17 -0400, J. Bruce Fields wrote: > On Tue, Aug 30, 2022 at 02:58:27PM +0000, Trond Myklebust wrote: > > On Tue, 2022-08-30 at 10:44 -0400, J. Bruce Fields wrote: > > > On Tue, Aug 30, 2022 at 09:50:02AM -0400, Jeff Layton wrote: > > > > On Tue, 2022-08-30 at 09:24 -0400, J. Bruce Fields wrote: > > > > > On Tue, Aug 30, 2022 at 07:40:02AM -0400, Jeff Layton wrote: > > > > > > Yes, saying only that it must be different is intentional. > > > > > > What > > > > > > we > > > > > > really want is for consumers to treat this as an opaque > > > > > > value > > > > > > for the > > > > > > most part [1]. Therefore an implementation based on hashing > > > > > > would > > > > > > conform to the spec, I'd think, as long as all of the > > > > > > relevant > > > > > > info is > > > > > > part of the hash. > > > > > > > > > > It'd conform, but it might not be as useful as an increasing > > > > > value. > > > > > > > > > > E.g. a client can use that to work out which of a series of > > > > > reordered > > > > > write replies is the most recent, and I seem to recall that > > > > > can > > > > > prevent > > > > > unnecessary invalidations in some cases. > > > > > > > > > > > > > That's a good point; the linux client does this. That said, > > > > NFSv4 > > > > has a > > > > way for the server to advertise its change attribute behavior > > > > [1] > > > > (though nfsd hasn't implemented this yet). > > > > > > It was implemented and reverted. The issue was that I thought > > > nfsd > > > should mix in the ctime to prevent the change attribute going > > > backwards > > > on reboot (see fs/nfsd/nfsfh.h:nfsd4_change_attribute()), but > > > Trond > > > was > > > concerned about the possibility of time going backwards. See > > > 1631087ba872 "Revert "nfsd4: support change_attr_type > > > attribute"". > > > There's some mailing list discussion to that I'm not turning up > > > right > > > now. > > https://lore.kernel.org/linux-nfs/a6294c25cb5eb98193f609a52aa8f4b5d4e81279.camel@xxxxxxxxxxxxxxx/ > is what I was thinking of but it isn't actually that interesting. > > > My main concern was that some filesystems (e.g. ext3) were failing > > to > > provide sufficient timestamp resolution to actually label the > > resulting > > 'change attribute' as being updated monotonically. If the time > > stamp > > doesn't change when the file data or metadata are changed, then the > > client has to perform extra checks to try to figure out whether or > > not > > its caches are up to date. > > That's a different issue from the one you were raising in that > discussion. > > > > Did NFSv4 add change_attr_type because some implementations > > > needed > > > the > > > unordered case, or because they realized ordering was useful but > > > wanted > > > to keep backwards compatibility? I don't know which it was. > > > > We implemented it because, as implied above, knowledge of whether > > or > > not the change attribute behaves monotonically, or strictly > > monotonically, enables a number of optimisations. > > Of course, but my question was about the value of the old behavior, > not > about the value of the monotonic behavior. > > Put differently, if we could redesign the protocol from scratch would > we > actually have included the option of non-monotonic behavior? > If we could design the filesystems from scratch, we probably would not. The protocol ended up being as it is because people were trying to make it as easy to implement as possible. So if we could design the filesystem from scratch, we would have probably designed it along the lines of what AFS does. i.e. each explicit change is accompanied by a single bump of the change attribute, so that the clients can not only decide the order of the resulting changes, but also if they have missed a change (that might have been made by a different client). However that would be a requirement that is likely to be very specific to distributed caches (and hence distributed filesystems). I doubt there are many user space applications that would need that high precision. Maybe MPI, but that's the only candidate I can think of for now? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx