The users' home directories are nfs'd to the compute nodes. On Wed, Jun 26, 2013 at 3:35 PM, Jonathan Billings <jsbillin@xxxxxxxxx>wrote: > Hello, > > Is your head node an NFS server, and are the jobs writing to the NFS share? > > > On Wed, Jun 26, 2013 at 3:27 PM, Doll, Margaret Ann < > margaret_doll@xxxxxxxxx > > wrote: > > > I have a computer cluster Running rocks 5.2, Centos 6. > > > > The head node is over loaded. There are 2 CPUs on the head node. > > > > top - 14:27:49 up 1 day, 6:11, 6 users, load average: 13.65, 14.12, > > 13.92 > > Tasks: 168 total, 3 running, 163 sleeping, 0 stopped, 2 zombie > > Cpu(s): 1.2%us, 1.9%sy, 0.0%ni, 0.0%id, 91.7%wa, 1.0%hi, 4.1%si, > > 0.0%st > > Mem: 2053088k total, 2001464k used, 51624k free, 74476k buffers > > Swap: 1020116k total, 388k used, 1019728k free, 1638076k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > > > 2515 nobody 15 0 218m 3176 1048 S 2.3 0.2 8:46.23 > > gmetad > > 2967 root 15 0 0 0 0 S 2.0 0.0 0:20.31 > > nfsd > > 2970 root 15 0 0 0 0 R 1.0 0.0 0:20.60 > > nfsd > > 3110 nobody 15 0 198m 20m 3360 S 0.3 1.0 4:22.71 > > gmond > > 29788 mad 15 0 90736 2336 1084 S 0.3 0.1 0:02.91 > > sshd > > 1 root 15 0 10372 684 572 S 0.0 0.0 0:00.51 > > init > > 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 > > migration/0 > > 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 > > ksoftirqd/0 > > 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 > > > > I have everyone logged off of the head node. Four jobs are running on > the > > compute nodes, but I believe they are non-parallel jobs which causes no > > traffic on the head node. The load_avg on each of the compute nodes is > > less than 8. Each compute node has 8 CPUs. > > > > How can I find the problem? I have seen the zombies go as high as 2 on > > the head node; most of the time there are 0 zombies. > > > > I did reboot the head node, but the problem comes back fairly quickly. > > -- > > redhat-list mailing list > > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > > https://www.redhat.com/mailman/listinfo/redhat-list > > > > > > -- > Jonathan Billings <jsbillin@xxxxxxxxx> > College of Engineering - CAEN - Unix and Linux Support > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list