On 5/9/22 10:03, Roger Heflin wrote: > For the hardware, if this is a gen9 with a v4 cpu (cat /proc/cpuinfo will show > if v3 or v4), then make sure all of your firmware is up to date and disable > all cstates in the bios and in the os. If this is a V3 then update bios, but > my experience was the v3 were pretty stable, but the v4's were very unstable > without the fixes. > > If you have a v4 there are some instability issues that without a current bios > are pretty bad and manifest as crashes, uncorrectable memory errors, random > lockup (no messages in ILO), and random PCIE issues/lockups. And since NVME > is PCIE based it may not show exactly the same I as I have previously seen > (since nothing I deal with has NVME) if the NVME buses are being impacted in a > similar manner. Updated bioses seem to reduce the crashes quite a lot, but > on some machines the additional c-state disables are claimed to also be needed > by the vendor. It's a v3 (I guess I'm glad I've stuck with the v3's). They've been quite stable in general. I'll apply the latest BIOS/firmware updates and see if that makes a difference - thanks for the reminder there. And then move to disable cstates if the issue persists. -- Orion Poplawski IT Systems Manager 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion@xxxxxxxx Boulder, CO 80301 https://www.nwra.com/
<<attachment: smime.p7s>>