Re: How do I recover from BTRFS issues?

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Tue, 1 Jun 2021 09:46:12 -0600

On Mon, May 31, 2021 at 8:28 PM John Mellor <john.mellor@xxxxxxxxx> wrote:
>
> I'm getting a pretty bad history with BTRFS as the default filesystem
> for Fedora Workstation.  Its messing up repeatedly and leaving me
> stuck.  I should note that I have used ext2/3/4 for about 20 years, ZFS
> on Solaris for even longer, and ZFS on Ubuntu for 2 major releases now.
> I have 2 different machines that have had issues so far.
>
> On my Gateway Fedora 33 daily-driver machine running the 5.11 kernels, I
> had a single Patriot SSD using the default BTRFS partitioning scheme. I
> kept seeing BTRFS scrub reporting uncorrectable issues and assumed that
> it was a defective SSD. However, this SSD is now in a Mandriva machine
> and is solid.  Its not the SSD.  I did about 10 reinstalls after having
> the machine lock up at random times, and finally trashed the machine in
> frustration.  I later discovered that it had developed a bad memory
> stick, which may have contributed to the initial problem cause.
> However, the lack of BTRFS robustness, no obvious mechanism to keep
> /home during a reinstall, and very poor BTRFS documentation have left me
> wary.
>
> On my current daily-driver machine, I have fully-updated Fedora 34
> running the 5.12 kernels on 2 disks set up as as a BTRFS RAID-1 pair.  I
> expected that would allow for much more robustness than the single disk
> setup on my F33 machine, giving me error protection similar to what I
> would have on ZFS.  Unfortunately, that does not appear to be the case.
> I have run low-level diagnostics on everything in this machine, and it
> is working properly.  Unusually, there aren't even any failed lowlevel
> disk blocks on either drive.  So the hardware on this older
> enterprse-class Lenovo desktop is not faulty.  I believe that due to
> faulty BIOS and security chip handling in the 5.12 kernel, I have had
> issues requiring me to occasionally hard powercycle the machine to get
> it to actually power down.
>
> One would expect that with BTRFS doing RAID-1, recovery from lockups
> should never leave the filesystem damaged.  That does not appear to be
> the case.  Currently the disks have no low-level errors, but BTRFS scrub
> shows 10 unrecoverable errors.  That's messed up.  Both disks are
> enterprise-class Seagate Constellation 500GB SATA drives with slightly
> different model numbers and manufacturing dates, so I don't believe that
> there is any firmware issue with them.

Any problem really needs kernel messages to have any idea what's going
on; and often it requires the entire dmesg because there is an
underlying problem.

>No matter what, I expect that
> the initial fsck or brtfs check should keep data integrity, but possibly
> backing out a few seconds in journal transactions.

btrfs check by default is --readonly and makes no changes at all to
the file system. --repair is expected to fix inconsistency or fail and
do nothing. It won't drop transactions. The normal write order for
btrfs is:

data->metadata->superblock

Between metadata and superblock writes; and after the superblock
write, there's a FLUSH/FUA that tells the drive to ensure exactly that
write ordering. That means it doesn't really matter if there's
reordering of data and metadata writes as long as all data and
metadata is flushed to stable media before the super block is written.
Because of COW, it means the on-disk superblock is only ever pointing
to valid trees. And in case of crash, you might see some writes go
missing. But if write ordering is not honored by the drive, it's a big
problem for any file system. Btrfs is definitely more difficult to
repair because the file system metadata isn't in any specific
location, so no assumptions can be made about what "should" be in a
particular location.

>
> I am aware of at least one kernel bug being highly relevant as the
> initial trigger - bugzilla 195809.

Maybe this one?
https://bugzilla.redhat.com/show_bug.cgi?id=1965809

The file system should be OK if using write through mode. If using
write back mode, all bets are off. Whether crash, power fail, or flash
(cache) device failure, it's expected severe data loss is a strong
possibility which is why safeguards have to be taken to use writeback
mode.

>  I believe that there are serious
> bugs in the hardware optimization in Firefox (one bug filed) and in
> Gnome and more relevant bugs in the kernel, but whatever the triggering
> issue, the filesystem should never fail.

Well the file system is on a storage stack of a lot of other software
and hardware. It's kinda hard to know what's going on without details
of that storage stack, as well as dmesg, as well as output from 'btrfs
check --readonly'

>
> How do I recover?  The machine is currently bootable and seems to run
> ok, but locks up once in a while on powerdown and on exiting firefox.  I
> cannot describe it as stable with this BTRFS issue.  A scrub currently
> says that / (and therefore also /home) has 10 unrecoverable errors.  I
> can find no Fedora or Suse documentation on how to recover from what
> should be impossible situations like this.

It's not supposed to happen. But once it happens, it's very case
specific and a bit complicated  to figure out what probably happened
and what the next steps are. It's good at avoiding trouble in the
first place due to COW, i.e. nothing is being overwritten, therefore
interruptions during writes, whether crash or power fail, aren't a
problem. But write ordering violations can result in more problems
with Btrfs. There are some safeguards built in to work around that,
but they are limited.

fpaste --btrfsinfo

Post the resulting URL. It'll expire in 24 hours. But if the problem
file system is for sysroot, that will help better understand the
storage stack, mount options, and recent btrfs messages. If the
problem file system is not sysroot, you'll want to add --printonly and
use the commands shown for each section on the proper mount point or
device.

>A reinstall will not
> preserve /home, leading to unacceptable data loss.

Hopefully there is a backup no matter what the file system is; and if
not, creating a backup is the top priority in any disaster situation.

There is a way to reinstall and preserve /home in Anaconda, but before
doing that we really need to understand what's broken. Because if the
file system is broken and can't be fixed, then it's mkfs time. And for
that you need backups of at least the important user data.

> I did an offline
> btrfs check on my F33 machine that left the machine unbootable, so its
> probably not an option either.  I'm stuck at this point.

btrfs check --readonly is safe, it's not touching anything on the drive at all

--repair should at worst fail safe but it does still have rather scary
warnings in the man page; it's best to consider --repair a last
resort. You need to use other options before --repair, but we need to
see the errors to know what to recommend.

>Should I just
> stop using the default BTRFS filesystem and go back to ext4?

On the one hand, e2fsck has a pretty good chance of fixing damaged
file system metadata resulting from storage stack problems, including
hardware issues. But it doesn't check data integrity at all, and data
is a much larger portion of what's written to a drive, so it's a much
larger target for hardware problems resulting in corruption,
dropped/torn/misdirected writes or even bit flips. Btrfs is
intentionally fussier about these kinds of problems. And yeah it'll
often just stop, to seek human attention what to do about it. That's
pretty onerous, but it's also what protects your data from being
damaged even worse.

But anyway there's not much to go on here yet. We need to see dmesg
for these problems. I personally prefer to see the entire dmesg
because isolated errors don't tell me about what was going on
immediately prior to the Btrfs error which is almost always a related
factor. Mount options can matter too.

In the raid1 case, same thing, need to see dmesg because that's where
btrfs spits out all of it's complaints. And it is quite verbose.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure