Re: memory testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 15, 2020 at 11:11 am, Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
4. "multiple concurrent kernel compiles" and "GCC seems to have memory
usage patterns that reliably trigger memory errors that
aren't caught by memtest"

In my experience, GCC is a really good test for RAM that is obviously bad. By "obviously bad," I mean RAM that will fail memtest86 every time. I've had two previous computers with obviously-bad RAM. I don't remember the first computer well, but on the second, GCC would often fail trying to compile assembler that was suspiciously one character away from being valid assembler. That's a pretty clear indication of trouble.

Sadly, because I am apparently cursed, my current system is exhibiting more subtle hardware errors. E.g. yesterday my desktop hung during a BlueJeans call [1]. I get occasional system lockups and crashes, sometimes many times in the same day; other times, the system might be stable for a week or more. I've finally been convinced this is really a hardware rather than software problem, so I'm ready to try swapping out components in hopes of finding what's wrong, but without a reliable test, it will be extremely hard to be sure if swapping out hardware has fixed the issue. GCC works flawlessly; I build large projects on a regular basis, and I've yet to see issues like I used to see regularly in the past on my system with obviously bad RAM. I've lost track of how much time I've spent running memtest86. I did manage to see a bad write in memtest86 once, but only once out of probably 20 different runs. Maybe if I were to run memtest for an entire week, that might be enough to reliably catch the problem, but I don't have that much patience, and even then, I honestly don't think it would be enough to be reliable. Just running it over long weekends is not enough; I've run memtest86 for 30+ hours multiple times without a single error. It's very frustrating. I've tried mprime, stress, stress-ng, stressapptest, no luck with any of these. I might try replacing all my RAM with ECC RAM, just because I can't think of what else to try at this point, but my niggling worry is that the problem could just as likely be the CPU or the motherboard. I have no idea.

I've half a mind to take this system to Microcenter and see if they can figure it out, but my fear is that if I can't find a test that catches the issue, they won't be able to either.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1856846#c2

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux