Re: Fedora and System Rescue CD disagree on the state of my btrfs filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 26, 2024 at 8:59 AM John Mellor <john.mellor@xxxxxxxxx> wrote:
On 2024-07-26 8:25 a.m., Richard Shaw wrote:
On Thu, Jul 25, 2024 at 6:29 PM Jeffrey Walton <noloader@xxxxxxxxx> wrote:
On Thu, Jul 25, 2024 at 2:15 PM Richard Shaw <hobbes1069@xxxxxxxxx> wrote:
>
> I recently had the Fedora install on my laptop go sideways (Ryzen 5 4500U w/ nvme disk).
>
> The filesystem was going readonly so I installed System Rescue CD to a thumb drive to investigate. Sure enough I had 4 unrecoverable errors.
>
> I don't keep anything critical on it so I decided to just reinstall with Fedora 40. Installation went fine but I did notice weird dnf output on my first updated buy everything SEEMED fine...
>
> I rebooted after the update and tried to log in when after a minute or two the system froze. Rebooted and sure enough a `dmesg | grep BTRFS` showed an error.
>
> Back to booting with System Rescue CD neither a `btrfs check --check-data-csum` or after mounting, a `btrfs scrub` show any errors.
>
> So who's right? And if there is an error, what's causing it? I've checked the drive with smartctl and even let the factory HP firmware diag tools run in a loop overnight checking everything without error.

The (1) irrecoverable disk errors from the original install, and (2)
the errors from the current install, and (3) the errors from dnf
indicate (to me) you have a failed NVMe drive. I used to see the
symptoms all the time when using SDcards in ARM dev boards. I would
put a swap file on the dev board (due to lack of resources), and the
drives would fail within about 6 months with the symptoms you
describe.

Now the interesting part (to me) is, (4) lack of errors reported by
some tools. That indicates to me a Chinese drive that misreports drive
size and statistics. They usually show up on thumb drives, but I
experienced one on a SSD drive years ago. Also see
<https://www.google.com/search?q=counterfeit+drive+misreport+size>.

All in all, I would replace the NVMe drive with a new one from a
trusted source. Not Amazon or eBay.

It's the drive that came with the laptop so unlikely to be a cheap/phony drive but the mystery does get deeper...

1. I was able to see the same results even if I booted to a F40 Live USB. I'm thinking that the system caught the problem quick enough the error didn't actually get written to the disk.

2. I consistently see the problem at about 30 seconds (from dmesg) if I boot the 6.9.9 or 6.9.10 kernels that have been installed via updates. If I boot 6.8.5, the kernel that shipped with F40 I can't reproduce the problem.

Of course that's strange because if this was a widespread issue there would be tons of people complaining.

Odds are that you have bad ram or are running the processor clock higher than what it can handle.  I also had this kind of issue when I had a bad video card, but the system generally froze or crashed and left the drive in an unrecoverable state.  The tools for fixing a btrfs partition are generally lacking in Fedora, and the tools that come with btrfs are also useless when the  failing partition is your active root partition.  I don't know if Suse has better tools, but its a huge problem with Fedora recoverability.


It's an HP Envy Laptop, no ability to overclock. I did upgrade the memory when I first got it over 3 years ago from 8GB to 16GB but it's plain DDR4-3200. As I previously mentioned I let the HP diag tools run overnight and completed 14 cycles without any errors and now I just finished letting Memtest86+ run for 5 complete cycles without any errors. 

The only common denominator I have found so far is the two 6.9 kernels I have installed.

Thanks,
Richard
-- 
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux