Re: system unresponsive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Wed, May 22, 2019 at 10:02 AM mark <m.roth@xxxxxxxxx> wrote:

> That seems unlikely. Foe one, I've seen that... but I *always* see entries
> in the log about the oom-killer being invoked. For another, this isn't a
> compute node, it's *only* a fileserver, serving projects, home
> directories, and backups (home-grown b/u, uses rsync), and backups don't
> start until well after midnight, and as we're business-hours only, there
> was less usage, and it does have 256G RAM....
>

I have two servers that would lock up like this occasionally, and if I let
them sit at the console long enough sometimes they would give a login
prompt. It took a lot of time and frustration (these are prod servers) but
I tracked it down to a problem in the XFS driver, as it never occurred on
the systems with EXT4 filesystems. The XFS driver would hang, preventing
writes to the filesystem. I could identify exactly when that happened as
all system logging would suddenly stop at the same second. Then OOMKiller
would come in and start killing off processes but that wouldn't be in the
logs on disk because the file system couldn't write. I rolled the servers
back to a 5xx series kernel and the issue didn't resurface. I recently let
them boot the newer 9xx series kernels and I'm hoping the XFS issue is
fixed.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos



[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]


  Powered by Linux