On Wed, 2006-01-18 at 13:38, Fong Vang wrote: > I have a total of 20 CentOS 4.1 systems running on fairly new > hardware. About 6 of them are experiencing strange hangs without any > indication -- nothing in /var/log/messages nor on the console -- > sometime within 10-30 minutes after a reboot. The systems still > responds to ping but you can't ssh to it. At the console, you could > type "root" at the user prompt but it hangs immediately after hitting > enter. > > Memory scan of all systems show no error. > > Any idea how to troubleshoot this problem. The system's not > responsive to do any troubleshooting and nothing abnormal is in the > log. > > We running htis kernel: kernel-smp-2.6.9-11.EL.i686.rpm. My first guess would be that something is consuming all possible memory and pushing everything else into swap. The system may not be completely hung, but it can't respond in a reasonable amount of time. If the logs for whatever services you run don't show anything, I'd watch with top over a period of time to see if a single program is doing it and frequent "ps ax" check to see if a large number of small processes are accumulating. You can get a hint about how fast new processes are being started by looking at the process id of the ps process when you run it repeatedly. I assume from the fact that you have 20 boxes that you are doing something that causes substantial load - perhaps it needs to be distributed better. -- Les Mikesell lesmikesell@xxxxxxxxx