On Wed, Jul 15, 2020 at 11:11 am, Chris Murphy
<lists@xxxxxxxxxxxxxxxxx> wrote:
4. "multiple concurrent kernel compiles" and "GCC seems to have memory
usage patterns that reliably trigger memory errors that
aren't caught by memtest"
In my experience, GCC is a really good test for RAM that is obviously
bad. By "obviously bad," I mean RAM that will fail memtest86 every
time. I've had two previous computers with obviously-bad RAM. I don't
remember the first computer well, but on the second, GCC would often
fail trying to compile assembler that was suspiciously one character
away from being valid assembler. That's a pretty clear indication of
trouble.
Sadly, because I am apparently cursed, my current system is exhibiting
more subtle hardware errors. E.g. yesterday my desktop hung during a
BlueJeans call [1]. I get occasional system lockups and crashes,
sometimes many times in the same day; other times, the system might be
stable for a week or more. I've finally been convinced this is really a
hardware rather than software problem, so I'm ready to try swapping out
components in hopes of finding what's wrong, but without a reliable
test, it will be extremely hard to be sure if swapping out hardware has
fixed the issue. GCC works flawlessly; I build large projects on a
regular basis, and I've yet to see issues like I used to see regularly
in the past on my system with obviously bad RAM. I've lost track of how
much time I've spent running memtest86. I did manage to see a bad write
in memtest86 once, but only once out of probably 20 different runs.
Maybe if I were to run memtest for an entire week, that might be enough
to reliably catch the problem, but I don't have that much patience, and
even then, I honestly don't think it would be enough to be reliable.
Just running it over long weekends is not enough; I've run memtest86
for 30+ hours multiple times without a single error. It's very
frustrating. I've tried mprime, stress, stress-ng, stressapptest, no
luck with any of these. I might try replacing all my RAM with ECC RAM,
just because I can't think of what else to try at this point, but my
niggling worry is that the problem could just as likely be the CPU or
the motherboard. I have no idea.
I've half a mind to take this system to Microcenter and see if they can
figure it out, but my fear is that if I can't find a test that catches
the issue, they won't be able to either.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1856846#c2
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx