Re: Linux server crash causing router switch to stop working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]





On 2/11/22 11:44 AM, David Rosenstrauch via arch-general wrote:
On 2/11/22 9:21 AM, Genes Lists via arch-general wrote:
Also it may be worthwhile running memcheck to be sure your memory is not faulty.

Yeah that thought occurred to me as well.  I ran a quick memtest86+ when I first built the computer and it didn't show any issues.  But perhaps I should take some downtime and have it run one through the full 3 rounds.


Following up with an update to close the loop on this thread, for anyone who's interested.


So bad news is: machine crashed again a couple of times the other day - and right in the middle of a large and important pacman update, no less! (150+ packages, including kernel, glibc, gcc, mariadb, bunch of other key libraries. Machine wound up not being bootable, and took several hours to recover. It was pretty ugly.) :-(

But good news is (aside from being able to recover from the failed update): I think I may have finally pinned down what's been causing these issues.


Long story short, I had an issue when I first built the box where the machine wouldn't POST and boot, with the mobo's "DRAM" health LED indicator lighting up. After much digging I was able to pin down that I had fastened the CPU cooler down too tight. That apparently can bend the mobo, and prevent some components from working correctly. In my case it was one of the memory slots, which is located very close to the CPU/cooler. (I figured it out for certain when I was able to boot with only one of the 2 memory sticks installed.) Once I pinpointed the issue, I loosened the cooler a bit and from then on I've been able to repeatedly boot as normal with both sticks installed.

Or so I thought! I'm pretty sure now that this has actually still been an issue, and that connectivity to one of the memory sticks has been periodically cutting out in the middle of operations and so causing these random crashes. Evidence: a) while debugging one of the recent crashes I again hit the same issue where it wouldn't POST and showed the same DRAM LED, and b) when I again took out one memory stick from the slot in question and rebooted and it's been running without issue ever since. (Although on 1/2 the RAM.) I'm pretty sure the issue isn't that the memory stick is bad, as I've run memtests several times with no issues. But I'll also try swapping sticks to confirm.


I'll give it a few more days of uptime to confirm that this was indeed the issue, but I'm growing increasingly confident that that's the case. Will spend some time reinstalling the cooler, thermal paste, and 2nd RAM stick again when I get some time and try to get everything back to full RAM capacity without crashing.


Many thanks to everyone for all the helpful suggestions!

DR




[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux