I'm getting a pretty bad history with BTRFS as the default filesystem
for Fedora Workstation. Its messing up repeatedly and leaving me
stuck. I should note that I have used ext2/3/4 for about 20 years, ZFS
on Solaris for even longer, and ZFS on Ubuntu for 2 major releases now.
I have 2 different machines that have had issues so far.
On my Gateway Fedora 33 daily-driver machine running the 5.11 kernels, I
had a single Patriot SSD using the default BTRFS partitioning scheme. I
kept seeing BTRFS scrub reporting uncorrectable issues and assumed that
it was a defective SSD. However, this SSD is now in a Mandriva machine
and is solid. Its not the SSD. I did about 10 reinstalls after having
the machine lock up at random times, and finally trashed the machine in
frustration. I later discovered that it had developed a bad memory
stick, which may have contributed to the initial problem cause.
However, the lack of BTRFS robustness, no obvious mechanism to keep
/home during a reinstall, and very poor BTRFS documentation have left me
wary.
On my current daily-driver machine, I have fully-updated Fedora 34
running the 5.12 kernels on 2 disks set up as as a BTRFS RAID-1 pair. I
expected that would allow for much more robustness than the single disk
setup on my F33 machine, giving me error protection similar to what I
would have on ZFS. Unfortunately, that does not appear to be the case.
I have run low-level diagnostics on everything in this machine, and it
is working properly. Unusually, there aren't even any failed lowlevel
disk blocks on either drive. So the hardware on this older
enterprse-class Lenovo desktop is not faulty. I believe that due to
faulty BIOS and security chip handling in the 5.12 kernel, I have had
issues requiring me to occasionally hard powercycle the machine to get
it to actually power down.
One would expect that with BTRFS doing RAID-1, recovery from lockups
should never leave the filesystem damaged. That does not appear to be
the case. Currently the disks have no low-level errors, but BTRFS scrub
shows 10 unrecoverable errors. That's messed up. Both disks are
enterprise-class Seagate Constellation 500GB SATA drives with slightly
different model numbers and manufacturing dates, so I don't believe that
there is any firmware issue with them. No matter what, I expect that
the initial fsck or brtfs check should keep data integrity, but possibly
backing out a few seconds in journal transactions.
I am aware of at least one kernel bug being highly relevant as the
initial trigger - bugzilla 195809. I believe that there are serious
bugs in the hardware optimization in Firefox (one bug filed) and in
Gnome and more relevant bugs in the kernel, but whatever the triggering
issue, the filesystem should never fail.
How do I recover? The machine is currently bootable and seems to run
ok, but locks up once in a while on powerdown and on exiting firefox. I
cannot describe it as stable with this BTRFS issue. A scrub currently
says that / (and therefore also /home) has 10 unrecoverable errors. I
can find no Fedora or Suse documentation on how to recover from what
should be impossible situations like this. A reinstall will not
preserve /home, leading to unacceptable data loss. I did an offline
btrfs check on my F33 machine that left the machine unbootable, so its
probably not an option either. I'm stuck at this point. Should I just
stop using the default BTRFS filesystem and go back to ext4?
Help appreciated!
--
John Mellor
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure