On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote: > Thanks! As I mentioned before, this is useful. I have a follow-up > question. Consider the following workload: > > creat foo > link (foo, A/bar) > fsync(foo) > crash > > In this case, after the file system recovers, do we expect foo's link > count to be 2 or 1? So, strictly ordered behaviour: create foo: - creates dirent in inode B and new inode A in an atomic transaction sequence #1 link foo -> A/bar - creates dirent in inode C and bumps inode A link count in an atomic transaction seqeunce #2. fsync foo - looks at inode A, sees it's "last modification" sequence counter as #2 - flushes all transactions up to and including #2 to the journal. See the dependency chain? Both the inodes and dirents in the create operation and the link operation are chained to the inode foo via the atomic transactions. Hence when we flush foo, we also flush the dependent changes because of the change atomicity requirements.... > I would say 2, Correct, for strict ordering. But.... > but POSIX is silent on this, Well, it's not silent, POSIX explicitly allows for fsync() to do nothing and report success. Hence we can't really look to POSIX to define how fsync() should behave. > so > thought I would confirm. The tricky part here is we are not calling > fsync() on directory A. Right. But directory A has a dependent change linked to foo. If we fsync() foo, we are persisting the link count change in that file, and hence all the other changes related to that link count change must also be flushed. Similarly, all the cahnges related to the creation on foo must be flushed, too. > In this case, its not a symlink; its a hard link, so I would say the > link count for foo should be 2. Right - that's the "reference counted object dependency" I refered to. i.e. it's a bi-direction atomic dependency - either we show both the new dirent and the link count change, or we show neither of them. Hence fsync on one object implies that we are also persisting the related changes in the other object, too. > But btrfs and F2FS show link count of > 1 after a crash. That may be valid if the dirent A/bar does not exist after recovery, but it also means fsync() hasn't actually guaranteed inode changes made prior to the fsync to be persistent on disk. i.e. that's a violation of ordered metadata semantics and probably a bug. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html