Hi, While investigating crash consistency bugs on btrfs, we came across workloads that demonstrate inconsistent behavior of fsync. Consider the following workload where fsync on the directory did not persist it. Workload 1: mkdir A Sync rename (A, B) creat B/foo fsync B/foo fsync B ---crash--- In this case, the directory B as well as file B/foo are missing. What's more worrying is that, on recovery from crash, we expect the contents of directory to be Dir A : should not exist Dir B : foo But instead, what we see is that: Dir A : foo Dir B : doesn't exist This state is acceptable if we had created the file foo in dir A and then renamed the directory - in that case it would mean the rename did not persist. However what we see here is that, a file created in directory B falsely appears in A, which is incorrect. However, if we did not persist the initial create of directory A, i.e Workload 2: mkdir A rename (A, B) creat B/foo fsync B/foo fsync B ---crash--- the directory B and its entry both get persisted in this case. Is this something to do with the directory entry A being already present in the FS/subvolume tree and then the changes to the directory inode going into the fsync log? We do not clearly understand the reason for such inconsistent behavior, but it does seem incorrect. Consider another case where we found inconsistent behavior in the way fsync is handled. Workload 3: mkdir A mkdir B creat A/foo link (A/foo, B/foo) fsync A/foo fsync B/foo ---crash--- In this case, file A/foo is persisted, but inspite of an explicit fsync on B/foo, the file goes missing. Workload 4: mkdir A mkdir B creat A/foo link (A/foo, B/foo) fsync B/foo fsync A/foo ---crash--- Note that, the only difference between workload 3 and 4 is the order of fsync on files A/foo and B/foo. In this case, the file B/foo is persisted, but A/foo is missing. What we interpret from the above workloads is that, the second fsync is behaving like a no-op, and in either cases, only the file that is fsynced first gets persisted. If we insert a sleep(45) between the two fsyncs in the workloads above, we see both the files A/foo and B/foo being persisted. No matter how many more links we create and fsync, only the first fsync persists the file, i.e for example, Workload 5: mkdir A mkdir B mkdir C creat A/foo link (A/foo, B/foo) link (A/foo, C/foo) fsync B/foo fsync A/foo fsync C/foo ---crash--- Only file B/foo gets persisted, and both A/foo and C/foo are missing. This seems like inconsistent behavior as only the first fsync persists the file, while all others don't seem to. Do you agree if this is indeed incorrect and needs fixing? All the above tests pass on ext4 and xfs. Please let us know what you feel about such inconsistency. Thanks, Jayashree Mohan -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html