On Thu, Sep 08, 2022 at 03:07:58PM -0400, Jeff Layton wrote: > On Thu, 2022-09-08 at 14:22 -0400, J. Bruce Fields wrote: > > On Thu, Sep 08, 2022 at 01:40:11PM -0400, Jeff Layton wrote: > > > Yeah, ok. That does make some sense. So we would mix this into the > > > i_version instead of the ctime when it was available. Preferably, we'd > > > mix that in when we store the i_version rather than adding it afterward. > > > > > > Ted, how would we access this? Maybe we could just add a new (generic) > > > super_block field for this that ext4 (and other filesystems) could > > > populate at mount time? > > > > Couldn't the filesystem just return an ino_version that already includes > > it? > > > > Yes. That's simple if we want to just fold it in during getattr. If we > want to fold that into the values stored on disk, then I'm a little less > clear on how that will work. > > Maybe I need a concrete example of how that will work: > > Suppose we have an i_version value X with the previous crash counter > already factored in that makes it to disk. We hand out a newer version > X+1 to a client, but that value never makes it to disk. > > The machine crashes and comes back up, and we get a query for i_version > and it comes back as X. Fine, it's an old version. Now there is a write. > What do we do to ensure that the new value doesn't collide with X+1? I was assuming we could partition i_version's 64 bits somehow: e.g., top 16 bits store the crash counter. You increment the i_version by: 1) replacing the top bits by the new crash counter, if it has changed, and 2) incrementing. Do the numbers work out? 2^16 mounts after unclean shutdowns sounds like a lot for one filesystem, as does 2^48 changes to a single file, but people do weird things. Maybe there's a better partitioning, or some more flexible way of maintaining an i_version that still allows you to identify whether a given i_version preceded a crash. --b.