Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/2/20 4:38 PM, Eric Sandeen wrote:
On 7/1/20 12:50 PM, Chris Murphy wrote:

...

Integrity checking is highly valued by some and less by others.
Considering that we know hardware isn't 100% reliable, and doesn't
always report its own failures as expected, and hence why most file
systems now at least checksum metadata, it's not persuasive to me that
the data should be left unchecked, and corruption ought to be handled
by user space somehow.

There's a flip side to this coin - in my experience, if the right btrfs
metadata blocks experience this disk corruption, there can be
a complete inability to recover the btrfs filesystem from that error -
i.e. it won't mount, and btrfsck --repair won't get it to a mountable
state.

So if we're saying disk corruption happens often enough that data
checksumming is critical, then it happens often enough that metadata
recovery is at least as critical.

I've been trying to quantify this and have not come up with a particularly
compelling test scenario, because it involves purposefully (though at random)
corrupting enough blocks on a filesystem image that a critical block gets
hit, so it looks synthetic.  But the net result is frequently a filesystem
where btrfsck and/or mount fails, and at first blush this type of failure
happens much more often than on other filesystems.[1]

I think Josef has alluded to this situation as well.  To me, that's a big
concern.  Not trying to be a wet blanket here but I think this needs to be
carefully investigated and evaluated to understand what impact it may have
on Fedora btrfs users and their ability to recover their data in the face
of metadata corruption, because it looks to me like a definite btrfs weak
spot.

Yeah this is what I've said many times over the last 3 weeks. Btrfs is more vulnerable to metadata corruption.

Now there's things that we can do to mitigate this. I have one patch up to handle one of the main cases (a corrupt global tree). The next patch set will be to keep entire metadata tree's around for longer as long as we have space to handle it. These two things will drastically improve the situation, but of course if I'm being evil we can still end up in a bad spot. These patches are not hard or controversial, they'll likely land in 5.9 which will be what F33 ships with (if I'm doing my math right).

And this sort of ignores the other side of the coin. fsfuzzer isn't just corrupting metadata, it's corrupting data. Btrfs is the only file system that's going to notice that and let the user know.

Checksumming is great because it lets the user know things are going wrong before they go catastrophically wrong. However just because we know something went wrong doesn't mean we can do anything about it, it just means that the user knows now that they need to restore from backups and find a new drive. These features do not mean you are absolved of good practices. If you care about data, you need to have it in multiple places. End of story. Btrfs is just going to let you know in advance that things are going wrong.

We're talking about this issue like it's reasonable that xfs and ext4 are going to allow the user to get back a bunch of data they don't know is ok or not. We're also talking about it like the user should be able to carry on his happy merry way. In these cases the drive is dying and needs to be shredded, and a new install needs to happen and a restore from backups needs to happen. Is the btrfs failure much less user friendly? No doubt about it. Is it any comfort at all when a user shows up and we say "where are your backups" and they say "what backups?", no. But if we're going to talk about this like ext4 and xfs are much better because they give you the _appearance_ that your data is fine, that's a bit disingenuous.

"Well what if it was just /usr." Sure, then you got lucky and you could copy things off. But what if it wasn't? That's the measure that's being applied to btrfs here. Is it likely that random corruption is going to be so bad that you end up with an unmountable file system? It's about as likely that the random corruption is on your dissertation or your family photographs. The difference is that btrfs will tell you that your dissertation or your family photographs are now bad, whereas ext4 and xfs will not.

These are tradeoffs no doubt. Every file system choice is a series of trade offs. We're arguing/optimizing for the narrowest usecase. Arguments can be made either way, but in the end is it important enough to not move ahead with btrfs? Thanks,

Josef
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux