Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Eric Sandeen <esandeen@xxxxxxxxxx> · Thu, 9 Jul 2020 10:51:49 -0700

On 7/6/20 12:07 AM, Chris Murphy wrote:
> On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen <sandeen@xxxxxxxxxx>
> wrote:
>> 
>> On 7/3/20 1:41 PM, Chris Murphy wrote:
>>> SSDs can fail in weird ways. Some spew garbage as they're
>>> failing, some go read-only. I've seen both. I don't have stats on
>>> how common it is for an SSD to go read-only as it fails, but once
>>> it happens you cannot fsck it. It won't accept writes. If it
>>> won't mount, your only chance to recover data is some kind of
>>> offline scrape tool. And Btrfs does have a very very good scrape
>>> tool, in terms of its success rate - UX is scary. But that can
>>> and will improve.
>> 
>> Ok, you and Josef have both recommended the btrfs restore
>> ("scrape") tool as a next recovery step after fsck fails, and I
>> figured we should check that out, to see if that alleviates the
>> concerns about recoverability of user data in the face of
>> corruption.
>> 
>> I also realized that mkfs of an image isn't representative of an
>> SSD system typical of Fedora laptops, so I added "-m single" to
>> mkfs, because this will be the mkfs.btrfs default on SSDs (right?).
>> Based on Josef's description of fsck's algorithm of throwing away
>> any block with a bad CRC this seemed worth testing.
>> 
>> I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G 
>> image, or a bit less than 1% of the filesystem blocks, at random. 
>> This is 1/4 the fuzzing rate from the original test.
>> 
>> So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair, 
>> mount, mount w/ recovery, and then restore ("scrape") if all that 
>> fails, see what we get.
> 
> What's the probability of this kind of corruption occurring in the 
> real world? If the probability is so low it can't practically be 
> computed, how do we assess the risk? And if we can't assess risk, 
> what's the basis of concern?

>From 20 years of filesystem development experience, I know that people
run filesystem repair tools.  It's just a fact.  For a wide variety of
reasons - from bugs, to hardware errors, to admin errors, you name it,
filesystems experience corruption and inconsistencies.  At that point
the administrator needs a path forward.

"people won't need to repair btrfs" is, IMHO, the position that needs
to be supported, not "filesystem repair tools should be robust."

>> I ran 50 loops, and got:
>> 
>> 46 btrfsck failures 20 mount failures
>> 
>> So it ran btrfs restore 20 times; of those, 11 runs lost all or 
>> substantially all of the files; 17 runs lost at least 1/3 of the 
>> files.
> 
> Josef states reliability of ext4, xfs, and Btrfs are in the same 
> ballpark. He also reports one case in 10 years in which he failed to 
> recover anything. How do you square that with 11 complete failures, 
> trivially produced? Is there even a reason to suspect there's
> residual risk?

Extrapolating from Facebook's usecases to the fedora desktop should be
approached with caution, IMHO.

I've provided evidence that if/when damage happens for whatever reason,
btrfs is unable to recover in place far more often than other filesytems.

> When metadata is single profile, Btrfs is basically an early warning 
> system.> The available research on uncorrectable errors, errors that drive ECC
> does not catch, suggests that users are decently likely to experience
> at least one block of corruption in the life of the drive. And that
> it tends to get worse up until drive failure. But there is much less
> chance to detect this, if the file system isn't also checksumming the
> vastly larger payload on a drive: the data.

One of the problems in this whole discussion is the assumption that filesystem
inconsistencies only arise from disk bitflips etc; that's just not the case.

Look, I'm just providing evidence of what I've found when re-evaluating the
btrfs administration/repair tools.  I've found them to be quite weak.

>From what I've gathered from these responses, btrfs is unique in that it is
/expected/ that if anything goes wrong, the administrator should be prepared
to scrape out remaining data, re-mkfs, and start over.  If that's acceptable
for the Fedora desktop, that's fine, but I consider it a risk that should not
be ignored when evaluating this proposal.

-Eric
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx