Re: BTRFS partition corrupted after deleting files in /home

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 13, 2021 at 2:36 AM Sreyan Chakravarty <sreyan32@xxxxxxxxx> wrote:
>
> 1) Is it possible there is nothing wrong with my drive, but there is
> something with my BIOS/HDD Firmware ? May be my firmware is not
> capable of BTRFS's stringent write requirements ?

The sample size is too small to know for sure what kind of HDD defect
it is. If it were an actuator or read/write head, it would happen more
often. If it were a localized media surface defect, it's unlikely two
copies of metadata would be affected. Btrfs dup profile metadata
chunks aren't colocated. They aren't very far apart but far enough I'd
expect a lot more metadata and/or data corruption than just one
commit. If it were defective memory (used as cache) in the drive, I'd
expect it'd happen more often. I have discussed with folks who know
way more about myriad drive failures that they've seen cases where a
write failure results in all queued (cached) writes being dropped. Is
that what happened? *shrug* Speculation. And is it a firmware bug, or
is it some other transient problem with the drive? *shrug*

I don't think it has anything to do with BIOS, or logic board related
including memory. And I don't think it has anything to do with Btrfs
write patterns. Btrfs write pattern isn't that variable, so whatever
pattern triggers a problem is going to happen more often than once
every month or two. Probably hundreds of times per day or more. And
yet, this happened once with Btrfs in a bit over a month. Pretty
weird.

Another possibility is power supply. Either brown outs or noisy
incoming power. Or even made noisy by a power supply. I had a student
a while back with a high end imaging setup, all brand new equipment.
Constant crashes. Replaced memory first. Then other hardware. And got
to adding a UPS. Voila. All problems went away. Unfortunately if
you're in such a situation it is process of elimination. And if it
only reproduces every couple of months, that could take a while.

> I say this because I have used Windows with NTFS on this machine, I
> have used Ubuntu with EXT4, and Fedora with thick-LVM with EXT4. None
> of these configurations gave me any such problems.

Yeah it's a fair point. But you did have a problem with LVM thin
provisioning which is not Btrfs. But does use checksums for at least
some (maybe all, not sure) of its metadata.

NTFS doesn't checksum anything. ext4 checksums its own metadata but
not data. A lost metadata write on either will be immediately detected
on ext4 before it causes too much confusion where NTFS will need to
get confused before it realizes something is wrong.

A lost data write on NTFS or ext4 means the next time that data is
read, it's just not there. It's garbage. So the OS won't even care,
it'll just hand over what it finds to the application, and it'd be up
to the application to handle the fact it got back garbage. It could
manifest in all kinds of ways or not even at all.

So they are sufficiently different in this area that they're not that
comparable. The most comparable would be OpenZFS. It also checksums
all metadata and data, but it's not a supported file system in Fedora.
So you're kind on your own, but there are Fedora users using OpenZFS
for sysroot (maybe even /boot, GRUB supports it).

> 2) Since there is a high likelihood that my filesystem is not
> completely fixed, then when I take a backup using partclone, dd or
> clonezilla won't those errors be carried over ?

Yes. I recommend a Pika Backup for a simple GUI solution to back
things up. It doesn't have any file system specific dependencies. I'm
sure if you look through the list archive for backups or start a new
thread with your requirements you'll get more suggestions.

>
> Even if I buy a new drive and restore the backup, I still might get crashes.

You definitely want a backup with its own independent file system. A
dd/ddrescue/clone is mainly for troubleshooting and disaster recovery.
It's not a great backup because a backup you want easy to keep up to
date. Daily or weekly, depending on your tolerance for loss.

>
> 3) This is a weird question but can you recommend me a HDD that I can
> buy which can handle BTRFS ? Or even which features I might look for
> while buying (not a SSD but a HDD)

All the drive manufacturers have played enough musical chairs, I can't
keep track of who makes or made what. Every drive fails eventually.
HDD follow the bell curve, so they tend to either fail early or fail
late in their lifespan. You can't really game the system. The odds of
picking something that exhibits this same behavior is astronomical.
Except for the NVMe drive, which came in the laptop I'm using, I
pretty much use a mix of warranty and price. It's cheaper to mitigate
risk with a backup, which you need anyway even if you get an expensive
drive. So I just ignore all claims of reliability and I don't even
care about 5+ year warranties. No 90 day warranties. 1 year if it's
dirt cheap. Otherwise 3 years. And never buy an extended warranty.

But I gotta say for sysroot, a small inexpensive SSD is pretty
awesome. It's a major upgrade. And yeah we probably see more firmware
bugs with SSD than HDD, but at least Facebook is using the cheapest
consumer drives possible, with Btrfs. And it's fine. Until it's not.
So again, all things come back to the backup. Don't worry. Just
backup. And if you backup, you won't worry. Or at least, you'll worry
less.

>
> 4) My manufacturer HP, does not make firmware updates for Linux, only
> for Windows. So is there a way to update the firmware(if available)
> without being on Windows ? Any ideas? Would a Windows PXE help ?

I don't think this is the problem. But also,
https://www.microsoft.com/en-us/software-download/windows10

Free download. If it only can update the logic board firmware with
Windows. It'll even work without a product key, just say you don't
have a product key at the part where it asks for one. It'll still
work, with some extra limitations that won't matter.

> 5) When you say "checksum errors in the month's old report" - which
> report are you referring to ? The thin-LVM crash or the smartctl crash
> ?

LVM thin.


-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux