well, it happen again. At 0810 EST (more or less) today the loadavg rocketed again. Here is what I found: [root@xxxxxx root]# uptime 08:50:34 up 6 days, 22:23, 2 users, load average: 29.33, 25.87, 18.79
That's not much in the way of "found". I'l reiterate my previous suggestions for finding the cause of the problem:
Do you have an NFS mount on this system? It's possible that the server is not responding, or is slow, and processes are simply waiting on IO to that server. Make sure that "df" completes. If it hangs, then you have a mount that's not responding.
It's also possible that you have a disk that's overloaded. Watch "iostat -x 2" for a while.
There's also a possibility that you have an extremely large directory on the system (tens or hundreds of thousands of files) that processes are trying to read. If the problem starts with "sendmail", for instance, look at the files/directories that it has open using
"ls -l /proc/<sendmail-pid>/fd"
Look in the output of "ps ax" for processes with a 'D' in their "STAT" column. These processes are stuck waiting for disk IO, and are probably the cause of your load. Look in their "fd" directory (as in the above suggestion) for open files and directories, or use 'lsof -p <pid>'.
-- Shrike-list mailing list Shrike-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/shrike-list