Re: Inconsistent behavior of fsync in btrfs

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, Apr 24, 2018 at 8:35 PM, Jayashree Mohan
<jayashree2912@xxxxxxxxx> wrote:
> Hi,
>
> While investigating crash consistency bugs on btrfs, we came across
> workloads that demonstrate inconsistent behavior of fsync.
>
> Consider the following workload where fsync on the directory did not persist it.
>
> Workload 1:
>
> mkdir A
> Sync
> rename (A, B)
> creat B/foo
> fsync B/foo
> fsync B
> ---crash---
>
> In this case, the directory B as well as file B/foo are missing.
> What's more worrying is that, on recovery from crash, we expect the
> contents of directory to be
>
> Dir A : should not exist
> Dir B :
>     foo
>
> But instead, what we see is that:
> Dir A :
>     foo
> Dir B : doesn't exist
>
>
> This state is acceptable if we had created the file foo in dir A and
> then renamed the directory - in that case it would mean the rename did
> not persist. However what we see here is that, a file created in
> directory B falsely appears in A, which is incorrect.
>
> However, if we did not persist the initial create of directory A, i.e
>
> Workload 2:
>
> mkdir A
> rename (A, B)
> creat B/foo
> fsync B/foo
> fsync B
> ---crash---
>
> the directory B and its entry both get persisted in this case.
>
> Is this something to do with the directory entry A being already
> present in the FS/subvolume tree and then the changes to the directory
> inode going into the fsync log?
>
> We do not clearly understand the reason for such inconsistent
> behavior, but it does seem incorrect.
>
> Consider another case where we found inconsistent behavior in the way
> fsync is handled.
>
> Workload 3:
>
> mkdir A
> mkdir B
> creat A/foo
> link (A/foo, B/foo)
> fsync A/foo
> fsync B/foo
> ---crash---
>
> In this case,  file A/foo is persisted, but inspite of an explicit
> fsync on B/foo, the file goes missing.
>
> Workload 4:
>
> mkdir A
> mkdir B
> creat A/foo
> link (A/foo, B/foo)
> fsync B/foo
> fsync A/foo
> ---crash---
>
> Note that, the only difference between workload 3 and 4 is the order
> of fsync on files A/foo and B/foo. In this case, the file B/foo is
> persisted, but A/foo is missing.
>
> What we interpret from the above workloads is that, the second fsync
> is behaving like a no-op, and in either cases, only the file that is
> fsynced first gets persisted. If we insert a sleep(45) between the two
> fsyncs in the workloads above, we see both the files A/foo and B/foo
> being persisted.
>
> No matter how many more links we create and fsync, only the first
> fsync persists the file, i.e for example,
>
> Workload 5:
>
> mkdir A
> mkdir B
> mkdir C
> creat A/foo
> link (A/foo, B/foo)
> link (A/foo, C/foo)
> fsync B/foo
> fsync A/foo
> fsync C/foo
> ---crash---
>
> Only file B/foo gets persisted, and both A/foo and C/foo are missing.
>
> This seems like inconsistent behavior as only the first fsync persists
> the file, while all others don't seem to. Do you agree if this is
> indeed incorrect and needs fixing?
>
> All the above tests pass on ext4 and xfs.
>
> Please let us know what you feel about such inconsistency.


I don't have answer to your question, but I'm curious exactly how you
simulate a crash? For my own really rudimentary testing I've been doing
crazy things like:

# grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger

And seeing what makes it to disk - or not. And I'm finding a some
non-determinstic results are possible even in a VM which is a bit
confusing. I'm sure with real hardware I'd find even more inconsistency.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux