The new top showed the same problem after the reboot. The queued jobs continued during the reboot of the head node. These jobs started up the problem with the io as soon as the head node was back up. Killing one of the queued jobs stopped the problem. I need a way to determine which queued job is causing the io problem, so that I can kill it. (And hopefully train the user to write their program to write to the compute node and not the head node during execution.) Thanks to everyone for their help. On Thu, Jun 27, 2013 at 9:15 AM, Yixin Luo <luoyixin@xxxxxxxxx> wrote: > It is true - %wa is too high if non-parallel job run. Ann, what is the new > top after reboot? > > Yixin > > > On Thu, Jun 27, 2013 at 7:31 AM, Doll, Margaret Ann < > margaret_doll@xxxxxxxxx > > wrote: > > > I installed the iozone program and ran ./iozone -a. > > How does this information help me find the offending program? > > > > Sorry for my ignorance. > > > > I do have 10 nfsd programs running. I only have four jobs on the queues > > none of which are running parallel code. > > > > > > On Thu, Jun 27, 2013 at 7:42 AM, Miner, Jonathan W (US SSA) < > > jonathan.w.miner@xxxxxxxxxxxxxx> wrote: > > > > > > > > > From: redhat-list-bounces@xxxxxxxxxx [redhat-list-bounces@xxxxxxxxxx > ] > > > on behalf of Yixin Luo [luoyixin@xxxxxxxxx] > > > > Sent: Wednesday, June 26, 2013 17:56 > > > > To: General Red Hat Linux discussion list > > > > Subject: Re: head node has an extremely high load average. > > > > > > > > NFS may hang up. Have you tried running autofs? > > > > > > Can you explain why "autofs" would be better than NFS? I have not > > > managed any NFS-based systems for nearly a decade, but from what I > > > remember, autofs simplifies the management aspect of network > filesystems; > > > but NFS is still the underlaying protocol. Without autofs, things were > > > mounted all the time, and you'd have to push changes out to all the > > > clients' /etc/fstab files. > > > > > > As for Margaret's original problem, her system looks very I/O bound. > > Like > > > someone else suggested, I'd start looking at the local disk performance > > and > > > see if one disk, or one bus was in contention for most of the traffic. > > > Then look at the number of nfsd processes and make sure they're > > > appropriate for the expected load. The iozone program should help you > > with > > > this task. > > > > > > http://www.thegeekstuff.com/2011/05/iozone-examples/ > > > > > > - Jon > > > > > > > > > > > > -- > > > redhat-list mailing list > > > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > > > https://www.redhat.com/mailman/listinfo/redhat-list > > > > > -- > > redhat-list mailing list > > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > > https://www.redhat.com/mailman/listinfo/redhat-list > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list