On Mon, 2022-09-12 at 15:32 +0000, Trond Myklebust wrote: > On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote: > > On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote: > > > On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote: > > > > On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote: > > > > > On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote: > > > > > > Because of the "seen" flag, we have a 63 bit counter to play > > > > > > with. > > > > > > Could > > > > > > we use a similar scheme to the one we use to handle when > > > > > > "jiffies" > > > > > > wraps? Assume that we'd never compare two values that were > > > > > > more > > > > > > than > > > > > > 2^62 apart? We could add i_version_before/i_version_after > > > > > > macros to > > > > > > make > > > > > > it simple to handle this. > > > > > > > > > > As far as I recall the protocol just assumes it can never > > > > > wrap. > > > > > I > > > > > guess > > > > > you could add a new change_attr_type that works the way you > > > > > describe. > > > > > But without some new protocol clients aren't going to know what > > > > > to do > > > > > with a change attribute that wraps. > > > > > > > > > > I think this just needs to be designed so that wrapping is > > > > > impossible > > > > > in > > > > > any realistic scenario. I feel like that's doable? > > > > > > > > > > If we feel we have to catch that case, the only 100% correct > > > > > behavior > > > > > would probably be to make the filesystem readonly. > > > > > > > > > > > > > Which protocol? If you're talking about basic NFSv4, it doesn't > > > > assume > > > > anything about the change attribute and wrapping. > > > > > > > > The NFSv4.2 protocol did introduce the optional attribute > > > > 'change_attr_type' that tries to describe the change attribute > > > > behaviour to the client. It tells you if the behaviour is > > > > monotonically > > > > increasing, but doesn't say anything about the behaviour when the > > > > attribute value overflows. > > > > > > > > That said, the Linux NFSv4.2 client, which uses that > > > > change_attr_type > > > > attribute does deal with overflow by assuming standard uint64_t > > > > wrap > > > > around rules. i.e. it assumes bit values > 63 are truncated, > > > > meaning > > > > that the value obtained by incrementing (2^64-1) is 0. > > > > > > Yeah, it was the MONOTONIC_INCRE case I was thinking of. That's > > > interesting, I didn't know the client did that. > > > > > > > If you look at where we compare version numbers, it is always some > > variant of the following: > > > > static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr > > *fattr, > > const struct inode *inode) > > { > > s64 diff = fattr->change_attr - > > inode_peek_iversion_raw(inode); > > if (diff > 0) > > return 1; > > return diff == 0 ? 0 : -1; > > } > > > > i.e. we do an unsigned 64-bit subtraction, and then cast it to the > > signed 64-bit equivalent in order to figure out which is the more > > recent value. > > Good! This seems like the reasonable thing to do, given that the spec doesn't really say that the change attribute has to start at low values. > > ...and by the way, yes this does mean that if you suddenly add a value > of 2^63 to the change attribute, then you are likely to cause the > client to think that you just handed it an old value. > > i.e. you're better off having the crash counter increment the change > attribute by a relatively small value. One that is guaranteed to be > larger than the values that may have been lost, but that is not > excessively large. > Yeah. Like with jiffies, you need to make sure the samples you're comparing aren't _too_ far off. That should be doable here -- 62 bits is plenty of room to store a lot of change values. My benchmark (maybe wrong, but maybe good enough) is to figure on an increment per nanosecond for a worst-case scenario. With that, 2^40 nanoseconds is >12 days. Maybe that's overkill. 2^32 ns is about an hour and 20 mins. That's probably a reasonable value to use. If we can't get a a new value onto disk in that time then something is probably very wrong. -- Jeff Layton <jlayton@xxxxxxxxxx>