On Tue, Mar 21, 2017 at 01:23:24PM -0400, Jeff Layton wrote: > On Tue, 2017-03-21 at 12:30 -0400, J. Bruce Fields wrote: > > - It's durable; the above comparison still works if there were reboots > > between the two i_version checks. > > - I don't know how realistic this is--we may need to figure out > > if there's a weaker guarantee that's still useful. Do > > filesystems actually make ctime/mtime/i_version changes > > atomically with the changes that caused them? What if a > > change attribute is exposed to an NFS client but doesn't make > > it to disk, and then that value is reused after reboot? > > > > Yeah, there could be atomicity there. If we bump i_version, we'll mark > the inode dirty and I think that will end up with the new i_version at > least being journalled before __mark_inode_dirty returns. So you think the filesystem can provide the atomicity? In more detail: - if I write to a file, a simultaneous reader should see either (old data, old i_version) or (new data, new i_version), not a combination of the two. - ditto for metadata modifications. - the same should be true if there's a crash. (If that's not possible, then I think we could live with a brief window of (new data, old i_version) as long as it doesn't persist beyond sync?) > That said, I suppose it is possible for us to bump the counter, hand > that new counter value out to a NFS client and then the box crashes > before it makes it to the journal. > > Not sure how big a problem that really is. The other case I was wondering about may have been unclear. Represent the state of a file by a (data, i_version) pair. Say: - file is modified from (F, V) to (F', V+1). - client reads and caches (F', V+1). - server crashes before writeback, so disk still has (F, V). - after restart, someone else modifies file to (F'', V+1). - original client revalidates its cache, sees V+1, concludes file data is still F'. This may not cause a real problem for clients depending only on traditional NFS close-to-open semantics. --b.