Memory doesn't appear to be a problem. Run "free" and look at the
amount
of free memory on the "+/- buffers/cache" line.
Top is reporting 3419 processes total with 600+ in a runnable
state. What
does "ps auwwx" tell you?
-----Original Message-----
From: redhat-list-bounces@xxxxxxxxxx [mailto:
redhat-list-bounces@xxxxxxxxxx] On Behalf Of Margaret Doll
Sent: Tuesday, February 16, 2010 11:54 AM
To: General Red Hat Linux discussion list
Subject: Looking for job which is causing a large work load
We have an eight processor system, running 2.6.18-128.1.6.el5xen
Redhat.
We noticed the other day that sendmail was just queuing jobs and not
sending them.
mqueue, however, is empty.
That lead us to look at the load average as a possible reason for the
failure of sendmail.
The QueueLA on sendmail is set to "8" as it should be.
w and top show that we have a high load average and most of the
memory
on the system
is being used. However, no job shows up in top using a lot of
memory.
top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06,
619.04, 618.98
Tasks: 3419 total, 1 running, 3417 sleeping, 0 stopped, 1
zombie
Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st
Mem: 16099528k total, 16063880k used, 35648k free, 487200k
buffers
Swap: 6127608k total, 105920k used, 6021688k free, 12683800k
cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11917 user1 16 0 13424 3624 784 S 3.8 0.0 0:04.16 top
11922 root 16 0 13360 3624 776 R 3.8 0.0 0:00.39 top
8187 user1 16 0 13356 3620 780 S 3.5 0.0 44:48.71 top
11895 user1 16 0 13452 3648 780 R 3.5 0.0 0:11.35 top
1 root 15 0 10348 632 540 S 0.0 0.0 0:01.75 init
2 root RT -5 0 0 0 S 0.0 0.0 0:07.51
migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:24.56
ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00
watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:03.77
migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:04.96
ksoftirqd/1
This machine is running long jobs from time to time and is hosting
large databases, so we don't want to reboot it.
How can we find the "job" that is using all the memory and bringing
the work load up to such a high level? Is it the zombie that is
reported in top?
Thanks
w
10:57:27 up 232 days, 15:25, 18 users, load average: 619.19,
619.28, 619.13
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
user1 pts/2 lfps 15Jan10 4days 0.10s 0.10s -tcsh
user1 pts/3 lfps Thu16 17:45m 44:55 44:54 top
user1 pts/4 lfps 15Jan10 25days 0.10s 0.10s -tcsh
user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s
sshd: user2 [priv]
crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/
local/itt/idl70/bin/bin.linux.x8
root pts/9 :0.0 23Oct09 116days 0.00s 0.00s
ssh -
l user1 moly
wjuser1 pts/10 porter2.geo.brow Mon10 6:01 0.11s 0.11s -tcsh
user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s
sshd: user2 [priv]
root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/
bin/gnome-session
user1 pts/16 lfps Mon14 3:47 10.30s 10.24s top
user1 pts/14 quahog2.geo.brow Mon15 8:22 17.54s 17.48s top
root pts/15 :0.0 23Oct09 116days 0.01s 0.01s -
bin/
tcsh
user1 pts/17 quahog2.geo.brow Mon14 18:19m 0.11s 0.11s -tcsh
root pts/23 :0.0 23Oct09 116days 0.01s 0.01s -
bin/
tcsh
root pts/24 :0.0 23Oct09 116days 0.01s 0.01s -
bin/
tcsh
user1 pts/28 lfps 15Jan10 4:08 0.12s 0.12s -tcsh
user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd:
user1 [priv]
root pts/7 :0.0 23Oct09 116days 5.78s 0.00s -
bin/
tcsh
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list