On Mon, Jun 29, 2020 at 10:26:37AM -0600, Chris Murphy wrote: > You've got an example where 'btrfs restore' saw no files at all? And > you think it's the file system rather than the hardware, why? Because the system failed to boot up, and even after offline repair attempts was still missing a sufficiently large chunk of the root filesystem to necessitate re-installation. Because the same hardware provided literally years of problem-free stability with ext4 (before) and xfs (after). > I think this is the wrong metaphor because it suggests btrfs caused > the crapping. The sequence is: btrfs does the right thing, drive > firmware craps itself and there's a power failure or a crash. Btrfs in > the ordinary case doesn't care and boots without complaint. In the far The first time, I needed to physically move the system, so the machine was shut down via 'shutdown -h now' on a console, and didn't come back up. The second time was a routine post-dnf-update 'reboot', without power cycling anything. At no point was there ever any unclean shutdown, and at the time of those reboots, no errors were reported in the kernel logs. Once is a fluke, twice is a trend... and I didn't have the patience for a third try because I needed to be able to rely on the system to not eat itself. I can't get the complete details at the moment, but it was an AMD E-350 system with an 32GB ADATA SATA, configured using anaconda's btrfs defaults and only about 30% of disk space used. Pretty minimal I/O. I will concede that it's possible there was/is some sort hardware/firmware bug, but if so, only btrfs seemed to trigger it. (more on this later) > Come on. It's cleanly unmounted and doesn't mount? Yes. (See above) (Granted, I'm using "mount" to mean "successfully mounted a writable filesystem with data largely intact" -- I'm a bit fuzzy on the exact details but I believe the it did mount read-only before the boot crapped out due to missing/inaccessable system libraries. I had to resort to a USB stick to attempt repairs that were only partially successful) > All file systems have write ordering expectations. If the hardware > doesn't honor that, it's trouble if there's a crash. What you're > describing is 100% a hardware crapped itself case. You said it cleanly > unmounted i.e. the exact correct write ordering did happen. And yet > the file system can't be mounted again. That's a hardware failure. That may be the case, but when there were no crashes, and neither ext4 nor xfs crapped themselves under day-to-day operation with the same hardware, it's reasonable to infer that the problem has _something_ to do with the variable that changed, ie btrfs. > There is no way for one person to determine if Btrfs is ready. That's > done by combination of synthetic tests (xfstests) and volume > regression testing on actual workloads. And by the way the Red Hat CKI > project is going to help run btrfs xfstests for Fedora kernels. Of course not, but the Fedora commnuity is made up of innumerable "one persons" each responsible for several special snowflake systems. Let's say for sake of argument that my bad btrfs experiences were due to bugs in device firmware with btrfs's completely-legal usage patterns rather than bugs in btrfs-from-five-years-ago. That's great... except my system still got trashed to the point of needing to be reinstalled, and finger-pointing can't bring back lost data. How many more special snowflake drives are out there? Think about how long it took Fedora to enable TRIM out of concern for potential data loss. Why should this be any different? (We're always going to be stuck with buggy firmware. FFS, the Samsung 860 EVO SATA SSD that I have in my main workstation will hiccup to the point of trashing data when used with AMD SATA controllers... even under Windows! Their official support answer is "Use an Intel controller". And that's a tier-one manufacturer who presumably has among the best QA and support in the industry..) If there is device/firmware known to be problematic, we need to keep track of these buggy devices and either automatically provide workarounds or some way to tell the user that proceeding with btrfs may be perilous to their data. (Or perhaps the issues I had were due to bugs in btrfs-of-five-years-ago that have long since been fixed. Either way, given my twice-burned experiences, I would want to verify that for myself before I entrust it with any data I care about...) > The questions are whether the Fedora community wants and is ready for > Btrfs by default. There are obviously some folks here (myself included) that have had very negative btrfs experiences. Similarly, there are folks that have successfully overseen large-scale deployements of btrfs in their managed enviroments (not on Fedora though, IIUC) So yes, I think an explicit "let's all test btrfs (as anaconda configures it) before we make it default" period is warranted. Perhaps one can argue that Fedora has already been doing that for the past two years (since 2018-or-later-btrfs is what everyone with positive results appears to be talking about), but it's still not clear that those deployments utilize the same feature set as Fedora's defaults, and how broad the hardware sample is. - Solomon -- Solomon Peachy pizza at shaftnet dot org (email&xmpp) @pizza:shaftnet dot org (matrix) High Springs, FL speachy (freenode)
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx