Re: Looking for job which is causing a large work load - solved

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I used  "pe -el" and found the zombie process.

On Feb 16, 2010, at 10:54 AM, Margaret Doll wrote:

We have an eight processor system, running 2.6.18-128.1.6.el5xen Redhat.

We noticed the other day that sendmail was just queuing jobs and not sending them.
mqueue, however, is empty.

That lead us to look at the load average as a possible reason for the failure of sendmail.
The QueueLA on sendmail is set to "8" as it should be.

w and top show that we have a high load average and most of the memory on the system
is being used.  However, no job shows up in top using a lot of memory.

top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06, 619.04, 618.98
Tasks: 3419 total,   1 running, 3417 sleeping,   0 stopped,   1 zombie
Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16099528k total, 16063880k used, 35648k free, 487200k buffers Swap: 6127608k total, 105920k used, 6021688k free, 12683800k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11917 user1     16   0 13424 3624  784 S  3.8  0.0   0:04.16 top
11922 root      16   0 13360 3624  776 R  3.8  0.0   0:00.39 top
8187 user1     16   0 13356 3620  780 S  3.5  0.0  44:48.71 top
11895 user1     16   0 13452 3648  780 R  3.5  0.0   0:11.35 top
   1 root      15   0 10348  632  540 S  0.0  0.0   0:01.75 init
2 root RT -5 0 0 0 S 0.0 0.0 0:07.51 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:24.56 ksoftirqd/0
   4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:03.77 migration/1 6 root 34 19 0 0 0 S 0.0 0.0 0:04.96 ksoftirqd/1

This machine is running long jobs from time to time and is hosting large databases, so we don't want to reboot it.

How can we find the "job" that is using all the memory and bringing the work load up to such a high level? Is it the zombie that is reported in top?


Thanks

w
10:57:27 up 232 days, 15:25, 18 users, load average: 619.19, 619.28, 619.13
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
user1    pts/2    lfps             15Jan10  4days  0.10s  0.10s -tcsh
user1    pts/3    lfps             Thu16   17:45m 44:55  44:54  top
user1    pts/4    lfps             15Jan10 25days  0.10s  0.10s -tcsh
user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s sshd: user2 [priv] crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/ local/itt/idl70/bin/bin.linux.x8 root pts/9 :0.0 23Oct09 116days 0.00s 0.00s ssh -l user1 moly
wjuser1  pts/10   porter2.geo.brow Mon10    6:01   0.11s  0.11s -tcsh
user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s sshd: user2 [priv] root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/ bin/gnome-session
user1    pts/16   lfps             Mon14    3:47  10.30s 10.24s top
user1    pts/14   quahog2.geo.brow Mon15    8:22  17.54s 17.48s top
root pts/15 :0.0 23Oct09 116days 0.01s 0.01s - bin/tcsh
user1    pts/17   quahog2.geo.brow Mon14   18:19m  0.11s  0.11s -tcsh
root pts/23 :0.0 23Oct09 116days 0.01s 0.01s - bin/tcsh root pts/24 :0.0 23Oct09 116days 0.01s 0.01s - bin/tcsh
user1    pts/28   lfps             15Jan10  4:08   0.12s  0.12s -tcsh
user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd: user1 [priv] root pts/7 :0.0 23Oct09 116days 5.78s 0.00s - bin/tcsh


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux