On 01/18/2017 03:41 AM, Sam Varshavchik wrote: > One of my servers was a bit "unresponsive". After waiting about 20 > seconds for an ssh connection, the root shell seemed fine, but top > showed this: > > top - 06:31:36 up 3 days, 21:37, 2 users, load average: 6.00, 6.00, 6.00 > Tasks: 294 total, 1 running, 277 sleeping, 0 stopped, 16 zombie > %Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, > 0.0 st > KiB Mem : 4045896 total, 1200824 free, 346588 used, 2498484 buff/cache > KiB Swap: 2096112 total, 2093448 free, 2664 used. 3288268 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 22031 root 20 0 156776 4092 3480 R 0.9 0.1 0:00.04 top > 1 root 20 0 147032 7660 5616 D 0.0 0.2 0:14.18 systemd > 2 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kthreadd > 3 root 20 0 0 0 0 S 0.0 0.0 0:00.01 > ksoftirqd/0 > > > The load average was 6. But nothing was burning CPU. > > After poking around, all signs were pointing to systemd doing what > systemd does best: > > # systemctl status > Failed to read server status: Connection timed out > > And the 16 zombie processes were system daemons, that should've been > reaped by systemd. > > "reboot" did nothing, of course. "reboot --force" did the trick. > > Setting aside yet another systemd fiasco (on a mostly idle server that > did absolutely nothing for the last ten hours) I'm curious as to how > /proc/loadavg could end up reporting a load average of 6, without any > processes being seeming to be doing anything. Your systemd was in a D (I/O wait) state and any process in that state will drive the load up very high. This can be caused by devices that aren't responding (e.g. a boatload of disk I/O), a ton of interrupts or possibly context switches. In your case it isn't interrupts (the "hi" and "si" fields of top), so it's probably I/O or context switches. You need to run something like "vmstat 5" to see I/O and context switches. The "bi" (block in) and "bo" (block out) fields under "--io--" in the vmstat output show disk I/O, the "cs" field under "--system--" shows the context switches. And yes, I agree...systemd is a spectacular failure. ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@xxxxxxxxxxxxxx - - AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 - - - - Let us think the unthinkable. Let us do the undoable. Let us - - prepare to grapple with the ineffable itself, and see if we may - - not eff it up after all. - - -- Douglas Adams - ---------------------------------------------------------------------- _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx