On Wed, Jan 13, 2021 at 2:36 AM Sreyan Chakravarty <sreyan32@xxxxxxxxx> wrote: > > 1) Is it possible there is nothing wrong with my drive, but there is > something with my BIOS/HDD Firmware ? May be my firmware is not > capable of BTRFS's stringent write requirements ? The sample size is too small to know for sure what kind of HDD defect it is. If it were an actuator or read/write head, it would happen more often. If it were a localized media surface defect, it's unlikely two copies of metadata would be affected. Btrfs dup profile metadata chunks aren't colocated. They aren't very far apart but far enough I'd expect a lot more metadata and/or data corruption than just one commit. If it were defective memory (used as cache) in the drive, I'd expect it'd happen more often. I have discussed with folks who know way more about myriad drive failures that they've seen cases where a write failure results in all queued (cached) writes being dropped. Is that what happened? *shrug* Speculation. And is it a firmware bug, or is it some other transient problem with the drive? *shrug* I don't think it has anything to do with BIOS, or logic board related including memory. And I don't think it has anything to do with Btrfs write patterns. Btrfs write pattern isn't that variable, so whatever pattern triggers a problem is going to happen more often than once every month or two. Probably hundreds of times per day or more. And yet, this happened once with Btrfs in a bit over a month. Pretty weird. Another possibility is power supply. Either brown outs or noisy incoming power. Or even made noisy by a power supply. I had a student a while back with a high end imaging setup, all brand new equipment. Constant crashes. Replaced memory first. Then other hardware. And got to adding a UPS. Voila. All problems went away. Unfortunately if you're in such a situation it is process of elimination. And if it only reproduces every couple of months, that could take a while. > I say this because I have used Windows with NTFS on this machine, I > have used Ubuntu with EXT4, and Fedora with thick-LVM with EXT4. None > of these configurations gave me any such problems. Yeah it's a fair point. But you did have a problem with LVM thin provisioning which is not Btrfs. But does use checksums for at least some (maybe all, not sure) of its metadata. NTFS doesn't checksum anything. ext4 checksums its own metadata but not data. A lost metadata write on either will be immediately detected on ext4 before it causes too much confusion where NTFS will need to get confused before it realizes something is wrong. A lost data write on NTFS or ext4 means the next time that data is read, it's just not there. It's garbage. So the OS won't even care, it'll just hand over what it finds to the application, and it'd be up to the application to handle the fact it got back garbage. It could manifest in all kinds of ways or not even at all. So they are sufficiently different in this area that they're not that comparable. The most comparable would be OpenZFS. It also checksums all metadata and data, but it's not a supported file system in Fedora. So you're kind on your own, but there are Fedora users using OpenZFS for sysroot (maybe even /boot, GRUB supports it). > 2) Since there is a high likelihood that my filesystem is not > completely fixed, then when I take a backup using partclone, dd or > clonezilla won't those errors be carried over ? Yes. I recommend a Pika Backup for a simple GUI solution to back things up. It doesn't have any file system specific dependencies. I'm sure if you look through the list archive for backups or start a new thread with your requirements you'll get more suggestions. > > Even if I buy a new drive and restore the backup, I still might get crashes. You definitely want a backup with its own independent file system. A dd/ddrescue/clone is mainly for troubleshooting and disaster recovery. It's not a great backup because a backup you want easy to keep up to date. Daily or weekly, depending on your tolerance for loss. > > 3) This is a weird question but can you recommend me a HDD that I can > buy which can handle BTRFS ? Or even which features I might look for > while buying (not a SSD but a HDD) All the drive manufacturers have played enough musical chairs, I can't keep track of who makes or made what. Every drive fails eventually. HDD follow the bell curve, so they tend to either fail early or fail late in their lifespan. You can't really game the system. The odds of picking something that exhibits this same behavior is astronomical. Except for the NVMe drive, which came in the laptop I'm using, I pretty much use a mix of warranty and price. It's cheaper to mitigate risk with a backup, which you need anyway even if you get an expensive drive. So I just ignore all claims of reliability and I don't even care about 5+ year warranties. No 90 day warranties. 1 year if it's dirt cheap. Otherwise 3 years. And never buy an extended warranty. But I gotta say for sysroot, a small inexpensive SSD is pretty awesome. It's a major upgrade. And yeah we probably see more firmware bugs with SSD than HDD, but at least Facebook is using the cheapest consumer drives possible, with Btrfs. And it's fine. Until it's not. So again, all things come back to the backup. Don't worry. Just backup. And if you backup, you won't worry. Or at least, you'll worry less. > > 4) My manufacturer HP, does not make firmware updates for Linux, only > for Windows. So is there a way to update the firmware(if available) > without being on Windows ? Any ideas? Would a Windows PXE help ? I don't think this is the problem. But also, https://www.microsoft.com/en-us/software-download/windows10 Free download. If it only can update the logic board firmware with Windows. It'll even work without a product key, just say you don't have a product key at the part where it asks for one. It'll still work, with some extra limitations that won't matter. > 5) When you say "checksum errors in the month's old report" - which > report are you referring to ? The thin-LVM crash or the smartctl crash > ? LVM thin. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx