On 7/1/20 12:50 PM, Chris Murphy wrote: ... > Integrity checking is highly valued by some and less by others. > Considering that we know hardware isn't 100% reliable, and doesn't > always report its own failures as expected, and hence why most file > systems now at least checksum metadata, it's not persuasive to me that > the data should be left unchecked, and corruption ought to be handled > by user space somehow. There's a flip side to this coin - in my experience, if the right btrfs metadata blocks experience this disk corruption, there can be a complete inability to recover the btrfs filesystem from that error - i.e. it won't mount, and btrfsck --repair won't get it to a mountable state. So if we're saying disk corruption happens often enough that data checksumming is critical, then it happens often enough that metadata recovery is at least as critical. I've been trying to quantify this and have not come up with a particularly compelling test scenario, because it involves purposefully (though at random) corrupting enough blocks on a filesystem image that a critical block gets hit, so it looks synthetic. But the net result is frequently a filesystem where btrfsck and/or mount fails, and at first blush this type of failure happens much more often than on other filesystems.[1] I think Josef has alluded to this situation as well. To me, that's a big concern. Not trying to be a wet blanket here but I think this needs to be carefully investigated and evaluated to understand what impact it may have on Fedora btrfs users and their ability to recover their data in the face of metadata corruption, because it looks to me like a definite btrfs weak spot. -Eric [1] some details - I used the mangle.c fuzzer from fsfuzzer, and modified it so that it corrupts 8192 bytes of an image, which in fs terms can be up to 8192 filesystem blocks. I also avoided the first 4k so that any filesystem signature was not damaged. I then ran a loop where I created a 1G base image, populated it, fuzzed it in this way, (so up to 3% of blocks were damaged) and ran the filesystem's fsck utility (in btrfs' case, btrfsck --repair) and then tried to mount (in btrfs' case, with bare mount, then -o usebackuproot if mount failed). If it mounted, I used "find | wc" to see how many files were reachable vs the original image. If either fsck or mount reports an exit code that reflects failure to complete properly, I recorded that. It was a quick hack, and it's not beautiful, so there are probably holes to be poked in it; if you want to look, I threw the bash script and the C source up at https://people.redhat.com/esandeen/fsckfuzzer/ Running 10 loops on each of btrfs, ext4, and xfs I got results that look like this (ext4 always creates empty lost+found so it will always find at least 1 file there) btrfs fsck failed 0 files in lost+found, 628 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 526 files in lost+found, 9 files gone/unreachable 595 files in lost+found, 55 files gone/unreachable 53 files in lost+found, 8 files gone/unreachable 57 files in lost+found, 44 files gone/unreachable fsck failed 7 files in lost+found, 1491 files gone/unreachable fsck failed, mount failed fsck failed, mount failed 88 files in lost+found, 40 files gone/unreachable == 4 fsck failures, 2 mount failures ext4 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 164 files in lost+found, 2 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 1 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 9 files in lost+found, 1 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable 1 files in lost+found, 0 files gone/unreachable == 0 fsck failures, 0 mount failures xfs 0 files in lost+found, 1 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 958 files in lost+found, 629 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 2 files in lost+found, 0 files gone/unreachable 0 files in lost+found, 1 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 0 files in lost+found, 0 files gone/unreachable 8 files in lost+found, 1 files gone/unreachable 3 files in lost+found, -1 files gone/unreachable == 0 fsck failures, 0 mount failures _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx