On Mon, Apr 16, 2018 at 7:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote: >> Thanks! As I mentioned before, this is useful. I have a follow-up >> question. Consider the following workload: >> >> creat foo >> link (foo, A/bar) >> fsync(foo) >> crash >> >> In this case, after the file system recovers, do we expect foo's link >> count to be 2 or 1? > > So, strictly ordered behaviour: > > create foo: > - creates dirent in inode B and new inode A in an atomic > transaction sequence #1 > > link foo -> A/bar > - creates dirent in inode C and bumps inode A link count in > an atomic transaction seqeunce #2. > > fsync foo > - looks at inode A, sees it's "last modification" sequence > counter as #2 > - flushes all transactions up to and including #2 to the > journal. > > See the dependency chain? Both the inodes and dirents in the create > operation and the link operation are chained to the inode foo via > the atomic transactions. Hence when we flush foo, we also flush the > dependent changes because of the change atomicity requirements.... > >> I would say 2, > > Correct, for strict ordering. But.... > >> but POSIX is silent on this, > > Well, it's not silent, POSIX explicitly allows for fsync() to do > nothing and report success. Hence we can't really look to POSIX to > define how fsync() should behave. > >> so >> thought I would confirm. The tricky part here is we are not calling >> fsync() on directory A. > > Right. But directory A has a dependent change linked to foo. If we > fsync() foo, we are persisting the link count change in that file, > and hence all the other changes related to that link count change > must also be flushed. Similarly, all the cahnges related to the > creation on foo must be flushed, too. > >> In this case, its not a symlink; its a hard link, so I would say the >> link count for foo should be 2. > > Right - that's the "reference counted object dependency" I refered > to. i.e. it's a bi-direction atomic dependency - either we show both > the new dirent and the link count change, or we show neither of > them. Hence fsync on one object implies that we are also persisting > the related changes in the other object, too. > >> But btrfs and F2FS show link count of >> 1 after a crash. > > That may be valid if the dirent A/bar does not exist after recovery, > but it also means fsync() hasn't actually guaranteed inode changes > made prior to the fsync to be persistent on disk. i.e. that's a > violation of ordered metadata semantics and probably a bug. Great, this matches our understanding perfectly. We have separately posted to the btrfs mailing list to confirm it is a bug. Thanks! -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html