I have a computer cluster Running rocks 5.2, Centos 6. The head node is over loaded. There are 2 CPUs on the head node. top - 14:27:49 up 1 day, 6:11, 6 users, load average: 13.65, 14.12, 13.92 Tasks: 168 total, 3 running, 163 sleeping, 0 stopped, 2 zombie Cpu(s): 1.2%us, 1.9%sy, 0.0%ni, 0.0%id, 91.7%wa, 1.0%hi, 4.1%si, 0.0%st Mem: 2053088k total, 2001464k used, 51624k free, 74476k buffers Swap: 1020116k total, 388k used, 1019728k free, 1638076k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2515 nobody 15 0 218m 3176 1048 S 2.3 0.2 8:46.23 gmetad 2967 root 15 0 0 0 0 S 2.0 0.0 0:20.31 nfsd 2970 root 15 0 0 0 0 R 1.0 0.0 0:20.60 nfsd 3110 nobody 15 0 198m 20m 3360 S 0.3 1.0 4:22.71 gmond 29788 mad 15 0 90736 2336 1084 S 0.3 0.1 0:02.91 sshd 1 root 15 0 10372 684 572 S 0.0 0.0 0:00.51 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 I have everyone logged off of the head node. Four jobs are running on the compute nodes, but I believe they are non-parallel jobs which causes no traffic on the head node. The load_avg on each of the compute nodes is less than 8. Each compute node has 8 CPUs. How can I find the problem? I have seen the zombies go as high as 2 on the head node; most of the time there are 0 zombies. I did reboot the head node, but the problem comes back fairly quickly. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list