As soon as one of the machines slows down again I will try that. I have been able to ^C out of a program that was taking a long time to start up however. The one user discovered this morning that when the system is in this state the output of "date" goes backwards. A simple csh loop, while (1) date end Shows that time *mainly* goes forward but perhaps every few seconds time suddenly jumps backwards by a few seconds. This may explain why the GUI clocks remain nearly "frozen" overnight if a machine is left in this state. The hardware clock (/usr/sbin/hwclock) is fine but the kernel's concept of time (/bin/date) is not always running forward for some reason. ntp was not configured on that machine and so it was not running. Does anyone know whether a Pentium 4 machine should run a simple kernel such as 2.6.9-89.29-1.El or if it should run an SMP kernel such as 2.6.9-89.29.1.ELsmp? Of the eight Pentium 4 machines I have, five have chosen the SMP kernel and the other three have not. There are four different motherboards, and so presumably four different BIOS loads, and the two machines displaying the problem both use the same motherboard. They both chose to run the SMP kernel. I noticed when I loaded the RHELv4 CDs that only a non-SMP kernel was installed. And at the first up2date run it brought in a new kernel, in both single and SMP forms, and then changed grub to run the new SMP kernel. At the moment I have one of the problem machines running the original 2.6.9-89.ELsmp kernel and the other one running the newer 2.6.9-89.29.1.EL (non-SMP) kernel to see either change makes a difference. Gary Yong Huang <yong321@xxxxxxxxx> wrote on 10/08/2010 10:34:59 AM: > From: Yong Huang <yong321@xxxxxxxxx> > To: Gary E Barnes/Cupertino/IBM@IBMUS > Cc: redhat-list@xxxxxxxxxx > Date: 10/08/2010 10:41 AM > Subject: Re: RHELv4 and v5 - So slow as to be unusable. > > Gary, > > As you proved, not all performance problems can be identified by > performance monitoring tools. In this case, "performance" is not a good > word. "Locking" may be better. > > We recently had a problem with TrendMicro on our RHEL 5 box. cp a 1GB > file took 35 minutes for the prompt to come back, even though the copied > file started to have the same checksum and size after about 1 minute. > /proc/<cp pid>/status shows disk sleep state. The cp command is not > killable, indicating it's in kernel mode not coming back up. strace or > pstack the process hangs (but strace or pstack is killable). The message > in /var/log/messages sheds light on the problem: > > Sep 26 11:02:11 ourhostname kernel: INFO: task cp:10658 blocked for > more than 120 seconds. > Sep 26 11:02:11 ourhostname kernel: "echo 0 > /proc/sys/kernel/ > hung_task_timeout_secs" disables this message. > ... > Sep 26 11:02:11 ourhostname kernel: Call Trace: > Sep 26 11:02:11 ourhostname kernel: [<ffffffff884a45a8>] > :splxmod:closeHook+0x784/0x9d8 > > So some splxmod module's closeHook function is the suspect since it's at > the top of the call stack. Searching on Google indicates it's a module > in TrendMicro's software. We contacted them and they quickly provided a patch. > > RHEL 4 doesn't have /proc/sys/kernel/hung_task_timeout_secs. I'm not sure > if the kernel can be reconfigured to add that. For those interested, the > source code is at > http://koders.com/c/fidFAF17DCD13DB287057ACC4136EEEFE2D9644BA9A.aspx > > In your case, can you try pstack and strace on a simple process such as > date (both programs need to be installed)? And tell us /proc/<pid>/status. > > Yong Huang > > > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list