Re: Server goes catatonic after a few days

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You could try the program memtester "dnf install memtester" "memtester 1g". This is a user level memory tester.

I also have a server that occasionally dies. It started doing this late last year under Fedora27. I wasn't sure if it was a particular kernel change or hardware, but I replaced the motherboard/memory/CPU at that time as it was about 5 years old. However it has still crashed occasionally with the new hardware and with a fresh install of Fedora29.

The latest /var/log/messages entry when it crashed was:

Jan 8 10:42:52 king mosquitto[1435]: 1546944172: New connection from 192.168.202.30 on port 1883.

Jan 8 10:42:52 king mosquitto[1435]: 1546944172: New client connected from 192.168.202.30 as DVES_00B2F8 (c1, k10, u'DVES_USER').

Jan 8 10:43:13 king mosquitto[1435]: 1546944193: Client DVES_00B2F8 has exceeded timeout, disconnecting.

Jan 8 10:43:13 king mosquitto[1435]: 1546944193: Socket error on client DVES_00B2F8, disconnecting.

#########################################################################################################################################################################################################Jan 8 18:03:39 king kernel: microcode: microcode updated early to revision 0xc6, date = 2018-04-17

Jan 8 18:03:39 king kernel: Linux version 4.19.9-300.fc29.x86_64 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 8.2.1 20181105 (Red Hat 8.2.1-5) (GCC)) #1 SMP Thu Dec 13 17:25:01 UTC 2018

Jan 8 18:03:39 king kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.9-300.fc29.x86_64 root=UUID=5d3007f8-fa92-4fe6-98a8-e812b680198f ro rd.auto LANG=en_GB.UTF-8

Jan 8 18:03:39 king kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'

Jan 8 18:03:39 king kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'

Jan 8 18:03:39 king kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

Jan 8 18:03:39 king kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'

Jan 8 18:03:39 king kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'

The "#" were in fact <nul> (0x00) bytes which is strange.


I do wonder if there is an obtuse kernel bug somewhere. This server has a Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz and is doing DVB recording amoungst other work. Other servers I have though seem fine.

Terry

On 06/01/2019 22:15, Alex wrote:
Hi,
I have a fedora29 system in our colo that's a few years old now and
just goes catatonic and stops responding after a few days. It's
happened a few times now, even with different kernels, so I suspect
it's a memory or hardware problem.

Is it possible to run memtest without having physical access to the
machine to insert a USB stick or CDROM?

After the machine reboots (via IPMI access), there's nothing in the
logs and no abrt-cli info on a kernel crash or other info I can find
about why it died.

What else can I do to troubleshoot this without having to drive to the
colo to check on it?

The last entry from journalctl just before it stopped responding was
just a regular nrpe entry, unrelated to the crash.

I've pasted the current dmesg output here:
http://pasted.co/4b700ee1

Any ideas greatly appreciated.
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux