On Thu, 2022-09-15 at 10:06 -0400, J. Bruce Fields wrote: > On Tue, Sep 13, 2022 at 09:14:32AM +1000, NeilBrown wrote: > > On Mon, 12 Sep 2022, J. Bruce Fields wrote: > > > On Sun, Sep 11, 2022 at 08:13:11AM +1000, NeilBrown wrote: > > > > On Fri, 09 Sep 2022, Jeff Layton wrote: > > > > > > > > > > The machine crashes and comes back up, and we get a query for > > > > > i_version > > > > > and it comes back as X. Fine, it's an old version. Now there > > > > > is a write. > > > > > What do we do to ensure that the new value doesn't collide > > > > > with X+1? > > > > > > > > (I missed this bit in my earlier reply..) > > > > > > > > How is it "Fine" to see an old version? > > > > The file could have changed without the version changing. > > > > And I thought one of the goals of the crash-count was to be > > > > able to > > > > provide a monotonic change id. > > > > > > I was still mainly thinking about how to provide reliable close- > > > to-open > > > semantics between NFS clients. In the case the writer was an NFS > > > client, it wasn't done writing (or it would have COMMITted), so > > > those > > > writes will come in and bump the change attribute soon, and as > > > long as > > > we avoid the small chance of reusing an old change attribute, > > > we're OK, > > > and I think it'd even still be OK to advertise > > > CHANGE_TYPE_IS_MONOTONIC_INCR. > > > > You seem to be assuming that the client doesn't crash at the same > > time > > as the server (maybe they are both VMs on a host that lost > > power...) > > > > If client A reads and caches, client B writes, the server crashes > > after > > writing some data (to already allocated space so no inode update > > needed) > > but before writing the new i_version, then client B crashes. > > When server comes back the i_version will be unchanged but the data > > has > > changed. Client A will cache old data indefinitely... > > I guess I assume that if all we're promising is close-to-open, then a > client isn't allowed to trust its cache in that situation. Maybe > that's > an overly draconian interpretation of close-to-open. > > Also, I'm trying to think about how to improve things incrementally. > Incorporating something like a crash count into the on-disk i_version > fixes some cases without introducing any new ones or regressing > performance after a crash. > > If we subsequently wanted to close those remaining holes, I think > we'd > need the change attribute increment to be seen as atomic with respect > to > its associated change, both to clients and (separately) on disk. > (That > would still allow the change attribute to go backwards after a crash, > to > the value it held as of the on-disk state of the file. I think > clients > should be able to deal with that case.) > > But, I don't know, maybe a bigger hammer would be OK: > If you're not going to meet the minimum bar of data integrity, then this whole exercise is just a massive waste of everyone's time. The answer then going forward is just to recommend never using Linux as an NFS server. Makes my life much easier, because I no longer have to debug any of the issues. > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx