Install atop - the best tool for tracking runaway processes/ user abuse/ network utilization -etc... On Tue, Feb 16, 2010 at 10:18 AM, Stainforth, Matthew (SD/DS) < Matthew.Stainforth@xxxxxx> wrote: > Memory doesn't appear to be a problem. Run "free" and look at the amount > of free memory on the "+/- buffers/cache" line. > > Top is reporting 3419 processes total with 600+ in a runnable state. What > does "ps auwwx" tell you? > > -----Original Message----- > From: redhat-list-bounces@xxxxxxxxxx [mailto: > redhat-list-bounces@xxxxxxxxxx] On Behalf Of Margaret Doll > Sent: Tuesday, February 16, 2010 11:54 AM > To: General Red Hat Linux discussion list > Subject: Looking for job which is causing a large work load > > We have an eight processor system, running 2.6.18-128.1.6.el5xen > Redhat. > > We noticed the other day that sendmail was just queuing jobs and not > sending them. > mqueue, however, is empty. > > That lead us to look at the load average as a possible reason for the > failure of sendmail. > The QueueLA on sendmail is set to "8" as it should be. > > w and top show that we have a high load average and most of the memory > on the system > is being used. However, no job shows up in top using a lot of memory. > > top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06, > 619.04, 618.98 > Tasks: 3419 total, 1 running, 3417 sleeping, 0 stopped, 1 zombie > Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi, > 0.0%si, 0.0%st > Mem: 16099528k total, 16063880k used, 35648k free, 487200k buffers > Swap: 6127608k total, 105920k used, 6021688k free, 12683800k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 11917 user1 16 0 13424 3624 784 S 3.8 0.0 0:04.16 top > 11922 root 16 0 13360 3624 776 R 3.8 0.0 0:00.39 top > 8187 user1 16 0 13356 3620 780 S 3.5 0.0 44:48.71 top > 11895 user1 16 0 13452 3648 780 R 3.5 0.0 0:11.35 top > 1 root 15 0 10348 632 540 S 0.0 0.0 0:01.75 init > 2 root RT -5 0 0 0 S 0.0 0.0 0:07.51 > migration/0 > 3 root 34 19 0 0 0 S 0.0 0.0 0:24.56 > ksoftirqd/0 > 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 > 5 root RT -5 0 0 0 S 0.0 0.0 0:03.77 > migration/1 > 6 root 34 19 0 0 0 S 0.0 0.0 0:04.96 > ksoftirqd/1 > > This machine is running long jobs from time to time and is hosting > large databases, so we don't want to reboot it. > > How can we find the "job" that is using all the memory and bringing > the work load up to such a high level? Is it the zombie that is > reported in top? > > > Thanks > > w > 10:57:27 up 232 days, 15:25, 18 users, load average: 619.19, > 619.28, 619.13 > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > user1 pts/2 lfps 15Jan10 4days 0.10s 0.10s -tcsh > user1 pts/3 lfps Thu16 17:45m 44:55 44:54 top > user1 pts/4 lfps 15Jan10 25days 0.10s 0.10s -tcsh > user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s > sshd: user2 [priv] > crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/ > local/itt/idl70/bin/bin.linux.x8 > root pts/9 :0.0 23Oct09 116days 0.00s 0.00s ssh - > l user1 moly > wjuser1 pts/10 porter2.geo.brow Mon10 6:01 0.11s 0.11s -tcsh > user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s > sshd: user2 [priv] > root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/ > bin/gnome-session > user1 pts/16 lfps Mon14 3:47 10.30s 10.24s top > user1 pts/14 quahog2.geo.brow Mon15 8:22 17.54s 17.48s top > root pts/15 :0.0 23Oct09 116days 0.01s 0.01s -bin/ > tcsh > user1 pts/17 quahog2.geo.brow Mon14 18:19m 0.11s 0.11s -tcsh > root pts/23 :0.0 23Oct09 116days 0.01s 0.01s -bin/ > tcsh > root pts/24 :0.0 23Oct09 116days 0.01s 0.01s -bin/ > tcsh > user1 pts/28 lfps 15Jan10 4:08 0.12s 0.12s -tcsh > user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd: > user1 [priv] > root pts/7 :0.0 23Oct09 116days 5.78s 0.00s -bin/ > tcsh > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- Alan A. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list