Re: Messages following a crash: [Hardware Error]: event severity: fatal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Maybe.  You might also check messages for MCE and/or machine check and see.  I have seen PCI errors on my SAS2008 when doing high io so I assume the same thing could happen if the video card was doing high traffic.     I removed all of the cards, cleaned the connectors on the cards, and vacuumed out the slots and then blew out the slots with air, and made sure to tightly seat and screw in the cards, and that fixed my crash that had been going on for many months (before I happened to be looking at dmesg for something else and found the MCE error that decoded to the PCI bus, mine is an AMD, your error may be the equivalent on Intel).

My pci error had a MTBF of around a month, and was 95% of the time during the weekly raid check while doing high IO.

On Tue, Jan 11, 2022 at 4:30 PM Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> wrote:


On 11/01/2022 23.57, Roger Heflin wrote:
> Well, usually a real hardware error (uncorrectable memory MCE, or CPU
> memory MCE, or PCI MCE) will cause an immediate reset (no reset button
> needed).
>
> That error could be a result of the reset button being pressed, and

Looks like it was the reset. While the lockup happened more times, I may have hit the reset in only
two cases.

As I recall the hard lockup usually happens when I watch a video (mythtv or, rarely, youtube).
May be related?

> not really a hardware error.   I work with enterprise vendors hw and
> they classify the stupidest things as errors (on a reboot they
> classify the nics and fiber channel cards losing link as hardware
> errors--and this happens every boot on device init, the the boot
> "errors" are 100x-1000x more common than the actual real life link
> downs--so their false alarm rate in horrible).
>
> What kind of MB/HW is it?
>
> On Tue, Jan 11, 2022 at 5:16 AM Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> wrote:
>>
>> I just had the system lock-up hard, requiring hitting the reset button.
>>
>> I now see in the system log:
>>
>> Jan 11 19:28:54 e7 kernel: BERT: Error records from previous boot:
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]: event severity: fatal
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:  Error 0, type: fatal
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   section_type: Firmware Error Record Reference
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   Firmware Error Record Type: SOC Firmware Error Record Type1 (Legacy CrashLog Support)
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   Revision: 0
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   Record Identifier: 100300100000000
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   00000000: 00000000 00000000 00000000 00000000  ................
>> ... continue until
>> Jan 11 19:28:54 e7 kernel: [Hardware Error]:   00000c00: ffffffff ffffffff ffffffff ffffffff  ................
>>
>> I understand that this is related to the preceding crash. If so, what does it tell me?
>>
>> Doing a search suggests that if this happens rarely then it can be ignored- true?.\
>> I now see that I had it also last Oct but not earlier. This hard lock-up happens at times (more than twice for sure)
>> and if I can do something about it then I would.
>>
>> TIA
>>
>> --
>> Eyal Lebedinsky (fedora@xxxxxxxxxxxxxx)

--
Eyal Lebedinsky (fedora@xxxxxxxxxxxxxx)
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux