On Sat, Aug 12, 2017 at 05:51:33PM -0400, Steven Tardy wrote: > > > On Aug 12, 2017, at 3:50 PM, Fred Smith <fredex@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > I had a series of kernel hardware error reports today while I was away > > from my computer: > > > > Message from syslogd@fcshome at Aug 12 10:12:24 ... > > kernel:[Hardware Error]: MC2 Error: VB Data ECC or parity error. > > > > Message from syslogd@fcshome at Aug 12 10:12:24 ... > > kernel:[Hardware Error]: Error Status: Corrected error, no action required. > > > > Message from syslogd@fcshome at Aug 12 10:12:24 ... > > kernel:[Hardware Error]: CPU:2 (15:2:0) MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 0x98444000010c0176 > > > > Message from syslogd@fcshome at Aug 12 10:12:24 ... > > kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV > > > > never saw anything like that before. > > > > cpu is: > > > > $ cat /proc/cpuinfo > > processor : 0 > > vendor_id : AuthenticAMD > > cpu family : 21 > > model : 2 > > model name : AMD FX(tm)-6300 Six-Core Processor > > stepping : 0 > > microcode : 0x600084f > > cpu MHz : 1400.000 > > cache size : 2048 KB > > physical id : 0 > > siblings : 6 > > core id : 0 > > cpu cores : 3 > > apicid : 16 > > initial apicid : 0 > > fpu : yes > > fpu_exception : yes > > cpuid level : 13 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 > > bogomips : 7023.90 > > TLB size : 1536 4K pages > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 48 bits physical, 48 bits virtual > > power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro > > > > > > six core AMD, above is one of the cores. > > > > Any clues to figure out the errors, and/or mitigate? > > > > thanks! > > > > Fred > > MC == Machine check exception. > The important part of a MC is the "status" code. > One can use the Intel doc "Architecture Software Developers Manual" to decode this (4000 page .pdf). > Unsure but it looks like AMD does similar MC codes. > Luckily Linux does some heavy lifting and decodes to "cache hierarchy error L2 data eviction". > The next most important part is the "corrected" bit. > > Now what does that really mean? > *shrug*, could be firmware/drivers/overheating/poor-CPU-seating/DIMM-seating/faulty-motherboard/faulty-CPU/faulty-DIMM. Well. overheating is possible... we don't live in the cleanest possible house, AND we have cats. so, in general I open up this box twice a year and vacuum out the house dirt and cat fuzzies. I'm probably overdue for this task. This is the first one of these I've had. Hope it's the last. but a little PM is in order either way. thanks for the reply. Fred > > Hope that doesn't confuse too much. (: > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > https://lists.centos.org/mailman/listinfo/centos -- ---- Fred Smith -- fredex@xxxxxxxxxxxxxxxxxxxxxx ----------------------------- The Lord detests the way of the wicked but he loves those who pursue righteousness. ----------------------------- Proverbs 15:9 (niv) ----------------------------- _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos