On 7/9/20 4:27 PM, Eric Sandeen wrote: > On 7/9/20 3:32 PM, Davide Cavalca via devel wrote: ... >> As someone on one of the teams at FB that has to deal with that, I can >> assure you all the scenarios you listed can and do happen, and they >> happen a lot. While we don't have the "laptop's out of battery" issue >> on the production side, we have plenty of power events and unplanned >> maintenances that can and will hit live machines and cut power off. >> Force reboots (triggered by either humans or automation) are also not >> at all uncommon. Rebuilding machines from scratch isn't free, even with >> all the automation and stuff we have, so if power loss or reboot events >> on machines using btrfs caused widespread corruption or other issues >> I'm confident we'd have found that out pretty early on. > > It is a bare minimum expectation that filesystems like btrfs, ext4, and xfs > do not suffer filesystem corruptions and inconsistencies due to reboots > and power losses. > > So for the record I am in no way insinuating that btrfs is less crash-safe > than other filesystems (though I have not tested that, so if I have time > I'll throw that into the mix as well.) So, we already have those tests in xfstests, and I put btrfs through a few loops. This is generic/475: # Copyright (c) 2017 Oracle, Inc. All Rights Reserved. # # FS QA Test No. 475 # # Test log recovery with repeated (simulated) disk failures. We kick # off fsstress on the scratch fs, then switch out the underlying device # with dm-error to see what happens when the disk goes down. Having # taken down the fs in this manner, remount it and repeat. This test # is a Good Enough (tm) simulation of our internal multipath failure # testing efforts. It fails within 2 loops. Is it a critical failure? I don't know; the test looks for unexpected things in dmesg, and perhaps the filter is wrong. But I see stack traces during the run, and message like: [689284.484258] BTRFS: error (device dm-3) in btrfs_sync_log:3084: errno=-117 Filesystem corrupted so I can't say for sure. Are btrfs devs using these tests to assess crash/powerloss resiliency on a regular basis? TBH I honestly did not expect to see any test failures here, whether or not they are test artifacts; any filesystem using xfstests as a benchmark needs to be keeping things up to date. As a further test, I skipped the dmesg check, which may or may not be finding false positives, and replaced it with a mount/umount/check cycle. That seems to pass, so if fsck validation is complete and correct, perhaps all is well in this regard. -Eric _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx