On Fri, May 03, 2019 at 07:17:54PM -0500, Vijay Chidambaram wrote: > > I think there might be a mis-understanding about the example > (reproduced below) and about SOMC. The relationship that matters is > not whether X happens before Y. The relationship between X and Y is > that they are in the same directory, so fsync(new file X) implies > fsync(X's parent directory) which contains Y. In the example, X is > A/foo and Y is A/bar. For truly un-related files such as A/foo and > B/bar, SOMC does indeed allow fsync(A/foo) to not persist B/bar. When you say "X and Y are in the same directory", how does this apply in the face of hard links? Remember, file X might be in a 100 different directories. Does that mean if changes to file X is visible after a crash, all files Y in any of X's 100 containing directories that were modified before X must have their changes be visible after the crash? I suspect that the SOMC as articulated by Dave does make such global guarantees. Certainly without Park and Shin's incremental fsync, unrelated files will have the property that if A/foo is modified after B/bar, and B/bar's metadata changes are visible after a crash, A/foo's metadata will also be visible. This is true for ext4, and xfs. Even if we ignore the hard link problem, and assume that it only applies for files foo and bar with st_nlinks == 1, the crash consistency guarantees you've described will *still* rule out Park and Shin's increment fsync. So depending on whether ext4 has fast fsync's enabled, we might or might not have behavior consistency with your proposed crash consistency rules. But at this point, even if we promulgate these "guarantees" in a kernel documentation file, applications won't be able to depend on them. If they do, they will be unreliable depending on which file system they use; so they won't be particularly useful for application authors care about portability. (Or worse, for users who are under the delusion that the application authors care about portability, and who will be subject to data corruption after a crash.) Do we *really* want to be promulgating these semantics to application authors? Finally, I'll note that generic/342 is much more specific, and your proposed crash consistency rule is more general. # Test that if we rename a file, create a new file that has the old name of the # other file and is a child of the same parent directory, fsync the new inode, # power fail and mount the filesystem, we do not lose the first file and that # file has the name it was renamed to. > touch A/foo > echo “hello” > A/foo > sync > mv A/foo A/bar > echo “world” > A/foo > fsync A/foo > CRASH Sure, that's one that fast commit will honor. But what about: echo "world" > A/foo echo "hello" > A/bar chmod 755 A/bar sync chmod 750 A/bar echo "new world" >> A/foo fsync A/foo CRASH .... will your crash consistency rules guarantee that the permissions change for A/bar is visible after the fsync of A/foo? Or if A/foo and A/bar exists, and we do: echo "world" > A/foo echo "hello" > A/bar sync mv A/bar A/quux echo "new world" >> A/foo fsync A/foo CRASH ... is the rename of A/bar and A/quux guaranteed to be visible after the crash? With Park and Shin's incremental fsync journal, the two cases I've described below would *not* have such guarantees. Standard ext4 today would in fact have these guarantees. But I would consider this an accident of the implementation, and *not* a promise that I would want to make for all time, precisely because it forbids us from making innovations that might improve performance. Even if I didn't have an engineer working on implementing Park and Shin's proposal, what worries me is if I did make this guarantee, it would tie my hands from making this optimization in the future --- and I can't necessarily forsee all possible optimizations we might want to make in the future. So the question I'm trying to ask is how many applications will actually benefit from "documenting current behavior" and effectively turning this into a promise for all time? Ultimately this is a tradeoff. Sure, this might enable applications to do things that are more aggressive than what Posix guarantees; but it also ties the hands of file system engineers. This is why I'd much rather do this via new system calls; say, maybe something like fsync_with_barrier(fd). This can degrade to fsync(fd) if necessary, but it allows the application to explicitly request certain semantics, as opposed to encouraging applications to *assume* that certain magic side effects will be there --- and which might not be true for all file systems, or for all time. We still need to very carefully define what the semantics of fsync_with_barrier(fd) would be --- especially whether fsync_with_barrier(fd) provides local (within the same directory) or global barrier guarantees, and if it's local, how are files with multiple "parent directories" interact with the guarantees. But at least this way it's an explicit declaration of what the application wants, and not an implicit one. Cheers, - Ted