Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Josef Bacik <josef@xxxxxxxxxxxxxx> · Wed, 1 Jul 2020 15:50:37 -0400

On 7/1/20 2:24 PM, Matthew Miller wrote:
On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
Making btrfs opt-in for F33 and (assuming the result go well) opt-out for F34
could be good option. I know technically it is already opt-in, but it's not
very visible or popular. We could make the btrfs option more prominent and
ask people to pick it if they are ready to handle potential fallout.

I'm leaning towards recommending this as well. I feel like we don't have
good data to make a decision on -- the work that Red Hat did previously when
making a decision was 1) years ago and 2) server-focused, and the Facebook
production usage is encouraging but also not the same use case. I'm
particularly concerned about metadata corruption fragility as noted in the
Usenix paper. (It'd be nice if we could do something about that!)

There's only so much we can do about this.  I've sent up patches to ignore 
failed global trees to allow users to more easily recover data in case of 
corruption in the case of global trees, but as they say if only 1 bit is off in 
a node, we throw the whole node away.  And throwing a node away means you lose 
access to any of its children, which could be a large chunk of the file system.

This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but this 
is just the reality of using checksums.  It's a checksum, not ECC.  We don't 
know _which_ bits are fucked, we just know somethings fucked, so we throw it all 
away.  If you have RAID or DUP then we go read the other copy, and fix the 
broken copy if we find a good copy.  If we don't, well then there's nothing 
really we can do.

As for their complaint about DIR_INDEX vs DIR_ITEM recovery, that's been around 
for a while now.  A lot of these things have been added over the last year.

Another thing to keep in mind is that fsck is _very_ conservative for a reason. 
It's only job is to get the fs back to the point that it can be mounted, it has 
no knowledge of what data is important and which is not.  So by default it 
doesn't do much, because we want the user to be able to use the rescue tools to 
pull off any data they can before they run repair.  Because it's possible that 
fsck decides to delete problematic entries, and maybe those entries are to data 
you cared about.

I've stated this many times before, btrfs is more vulnerable to things going 
wrong.  It's also more likely to notice things going wrong.  There's things we 
can do to make it easier in the face of these issues, they're patches I've 
written and submitted in the last few days.  There's bigger, more complex things 
that I can do to make us more resilient in the face of these corruptions.  But 
even with all of the things I have in my head, I could still go do one or two 
things and render the file system unusable.  Would these things happen in 
practice?  Unlikely.  Is it impossible?  Unfortunately no.  Thanks,

Josef
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx