Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Eric Sandeen <esandeen@xxxxxxxxxx> · Thu, 9 Jul 2020 16:23:31 -0700

On 7/9/20 4:27 PM, Eric Sandeen wrote:
> On 7/9/20 3:32 PM, Davide Cavalca via devel wrote:

...

>> As someone on one of the teams at FB that has to deal with that, I can
>> assure you all the scenarios you listed can and do happen, and they
>> happen a lot. While we don't have the "laptop's out of battery" issue
>> on the production side, we have plenty of power events and unplanned
>> maintenances that can and will hit live machines and cut power off.
>> Force reboots (triggered by either humans or automation) are also not
>> at all uncommon. Rebuilding machines from scratch isn't free, even with
>> all the automation and stuff we have, so if power loss or reboot events
>> on machines using btrfs caused widespread corruption or other issues
>> I'm confident we'd have found that out pretty early on.
> 
> It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs
> do not suffer filesystem corruptions and inconsistencies due to reboots
> and power losses.
> 
> So for the record I am in no way insinuating that btrfs is less crash-safe
> than other filesystems (though I have not tested that, so if I have time
> I'll throw that into the mix as well.)

So, we already have those tests in xfstests, and I put btrfs through a few
loops.  This is generic/475:

# Copyright (c) 2017 Oracle, Inc.  All Rights Reserved.
#
# FS QA Test No. 475
#
# Test log recovery with repeated (simulated) disk failures.  We kick
# off fsstress on the scratch fs, then switch out the underlying device
# with dm-error to see what happens when the disk goes down.  Having
# taken down the fs in this manner, remount it and repeat.  This test
# is a Good Enough (tm) simulation of our internal multipath failure
# testing efforts.

It fails within 2 loops.  Is it a critical failure? I don't know; the
test looks for unexpected things in dmesg, and perhaps the filter is
wrong.  But I see stack traces during the run, and message like:

[689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117 Filesystem corrupted

so I can't say for sure.

Are btrfs devs using these tests to assess crash/powerloss resiliency
on a regular basis?  TBH I honestly did not expect to see any test
failures here, whether or not they are test artifacts; any filesystem
using xfstests as a benchmark needs to be keeping things up to date.

As a further test, I skipped the dmesg check, which may or may not be
finding false positives, and replaced it with a mount/umount/check cycle.
That seems to pass, so if fsck validation is complete and correct, perhaps
all is well in this regard.

-Eric
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx