Memory doesn't appear to be a problem. Run "free" and look at the amount of free memory on the "+/- buffers/cache" line. Top is reporting 3419 processes total with 600+ in a runnable state. What does "ps auwwx" tell you? -----Original Message----- From: redhat-list-bounces@xxxxxxxxxx [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of Margaret Doll Sent: Tuesday, February 16, 2010 11:54 AM To: General Red Hat Linux discussion list Subject: Looking for job which is causing a large work load We have an eight processor system, running 2.6.18-128.1.6.el5xen Redhat. We noticed the other day that sendmail was just queuing jobs and not sending them. mqueue, however, is empty. That lead us to look at the load average as a possible reason for the failure of sendmail. The QueueLA on sendmail is set to "8" as it should be. w and top show that we have a high load average and most of the memory on the system is being used. However, no job shows up in top using a lot of memory. top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06, 619.04, 618.98 Tasks: 3419 total, 1 running, 3417 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16099528k total, 16063880k used, 35648k free, 487200k buffers Swap: 6127608k total, 105920k used, 6021688k free, 12683800k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11917 user1 16 0 13424 3624 784 S 3.8 0.0 0:04.16 top 11922 root 16 0 13360 3624 776 R 3.8 0.0 0:00.39 top 8187 user1 16 0 13356 3620 780 S 3.5 0.0 44:48.71 top 11895 user1 16 0 13452 3648 780 R 3.5 0.0 0:11.35 top 1 root 15 0 10348 632 540 S 0.0 0.0 0:01.75 init 2 root RT -5 0 0 0 S 0.0 0.0 0:07.51 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:24.56 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:03.77 migration/1 6 root 34 19 0 0 0 S 0.0 0.0 0:04.96 ksoftirqd/1 This machine is running long jobs from time to time and is hosting large databases, so we don't want to reboot it. How can we find the "job" that is using all the memory and bringing the work load up to such a high level? Is it the zombie that is reported in top? Thanks w 10:57:27 up 232 days, 15:25, 18 users, load average: 619.19, 619.28, 619.13 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT user1 pts/2 lfps 15Jan10 4days 0.10s 0.10s -tcsh user1 pts/3 lfps Thu16 17:45m 44:55 44:54 top user1 pts/4 lfps 15Jan10 25days 0.10s 0.10s -tcsh user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s sshd: user2 [priv] crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/ local/itt/idl70/bin/bin.linux.x8 root pts/9 :0.0 23Oct09 116days 0.00s 0.00s ssh - l user1 moly wjuser1 pts/10 porter2.geo.brow Mon10 6:01 0.11s 0.11s -tcsh user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s sshd: user2 [priv] root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/ bin/gnome-session user1 pts/16 lfps Mon14 3:47 10.30s 10.24s top user1 pts/14 quahog2.geo.brow Mon15 8:22 17.54s 17.48s top root pts/15 :0.0 23Oct09 116days 0.01s 0.01s -bin/ tcsh user1 pts/17 quahog2.geo.brow Mon14 18:19m 0.11s 0.11s -tcsh root pts/23 :0.0 23Oct09 116days 0.01s 0.01s -bin/ tcsh root pts/24 :0.0 23Oct09 116days 0.01s 0.01s -bin/ tcsh user1 pts/28 lfps 15Jan10 4:08 0.12s 0.12s -tcsh user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd: user1 [priv] root pts/7 :0.0 23Oct09 116days 5.78s 0.00s -bin/ tcsh -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list