Interesting. You mentioned file IO maybe being related? Is this a custom kernel? Is it possible you're using a circuitous IO route to get to disk (like maybe IDE SCSI, or some such?) -----Original Message----- From: redhat-list-bounces@xxxxxxxxxx [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of Mohammad Zakaria Sent: Wednesday, October 13, 2010 8:44 AM To: General Red Hat Linux discussion list Subject: EXTERNAL:Re: RHELv4 and v5 - So slow as to be unusable. Hello Grey, Are those machines having the same brand or HW?? and if you reconnect to your NTP server does the clock start counting right or at the same 7.8 rate?? try to reset one of the machines BIOS to its defaults and check the results?? have you tried to disable one of your processors cores and work with a single processor?? --- On Sat, 10/9/10, Mohammad Zakaria <myz_sa@xxxxxxxxx> wrote: From: Mohammad Zakaria <myz_sa@xxxxxxxxx> Subject: Re: RHELv4 and v5 - So slow as to be unusable. To: "General Red Hat Linux discussion list" <redhat-list@xxxxxxxxxx> Date: Saturday, October 9, 2010, 12:58 PM If you have one piece of RAM try to replace it and check your box status, or if it is a combination of 2 sets, try the system performance with each RAM separately, if there is any problem with your RAM HW you should detect that easily and fix it. ________________________________ From: Gary E Barnes <gebarnes@xxxxxxxxxx> To: redhat-list@xxxxxxxxxx Sent: Thu, October 7, 2010 9:17:01 PM Subject: Re: RHELv4 and v5 - So slow as to be unusable. > From: Laszlo Beres <laszlo@xxxxxxxx> > Subject: Re: RHELv4 and v5 - So slow as to be unusable. > > On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes@xxxxxxxxxx> wrote: > > > "top" says that nothing is going on although the load average is 3+. > > "sar" also says that nothing is going on. > > There's no such thing "nothing is going on". You should see CPU > status, process status, etc. vmstat also can give you some hints about > the system health. Oh but there is such a thing. I have one of the machines in this weird slowdown state right at this moment. It started around 4:45pm yesterday, after running perfectly for about 3 hours 15 minutes, and I left it overnight to see if maybe it would get "over it" by itself. Hasn't happened though. Here is the very first header from the "top" display of a top I started just for this example. top - 19:18:20 up 4:33, 4 users, load average: 3.56, 3.58, 3.54 Tasks: 159 total, 16 running, 143 sleeping, 0 stopped, 0 zombie Cpu(s): 1.3% us, 0.4% sy, 2.9% ni, 95.1% id, 0.3% wa, 0.0% hi, 0.0% si Mem: 2586400k total, 1880032k used, 706368k free, 193036k buffers Swap: 4192956k total, 0k used, 4192956k free, 1220324k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5813 root 16 0 164m 31m 6124 R 0.2 1.3 25:05.54 X 6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.86 xalarm 27229 geb2 16 0 3008 960 696 R 0.2 0.0 0:00.01 top 1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init And here is the first refresh of that display (I'm capturing this in an Emacs buffer if you're curious). Tasks: 161 total, 3 running, 158 sleeping, 0 stopped, 0 zombie Cpu(s): 0.9% us, 0.5% sy, 0.1% ni, 98.5% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2586400k total, 1885680k used, 700720k free, 193036k buffers Swap: 4192956k total, 0k used, 4192956k free, 1220584k cached Here is the second, notice the 100% idle value. Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2586400k total, 1885696k used, 700704k free, 193044k buffers Swap: 4192956k total, 0k used, 4192956k free, 1220576k cached There is memory available. There is swap available. Idle occasionally drops to 99.9%. Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% us, 0.1% sy, 0.0% ni, 99.8% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2586400k total, 1885464k used, 700936k free, 193076k buffers Swap: 4192956k total, 0k used, 4192956k free, 1220544k cached The processes that show up in the first line or two of top are things such as: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27231 geb 16 0 28316 11m 8412 S 9.8 0.5 0:00.41 gnome-terminal 5813 root 16 0 165m 31m 6316 S 3.8 1.3 25:05.70 X 6697 geb 16 0 22280 10m 7644 S 1.9 0.4 0:18.69 wnck-applet 3817 rpc 15 0 2336 592 484 S 0.2 0.0 0:01.60 portmap 6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.08 gam_server 6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.87 xalarm 3805 root 16 0 2492 312 220 S 0.2 0.0 0:00.55 irqbalance 4867 root 16 0 2800 844 624 D 0.2 0.0 0:03.32 rpc.mountd 1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init 6451 geb 16 0 12764 7512 1688 S 0.1 0.3 0:01.66 gconfd-2 27229 geb2 16 0 3012 1044 772 R 0.1 0.0 0:00.05 top 27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.07 top 4996 root 16 0 4772 3104 1536 S 0.2 0.1 0:02.47 hald 6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.09 gam_server 1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init 27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.09 top 23574 geb 16 0 145m 68m 26m S 0.2 2.7 2:44.86 firefox-bin 1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init As you can see, there is essentially "nothing going on". An yet the machine is very unresponsive. If I run a command that hasn't been run in a while (don't know the time frame, but it seems to be only minutes) then the command takes >30 seconds to execute. For example, I just did the "date" command and when it finally responded I did the hwclock command. Both took >30 seconds to run. Now if I repeat those commands they execute immediately. I'm presuming that this is due to executable file caching in the operating system. If I wait a while then the >30 second wait will reappear for those same commands. Presumably they've left that cache. This behavior is observable both in xterm's on the console and also through ssh connections from another machine. Programs that are already loaded and running seem to be pretty much ok, at least until they need to go read some new file or write some new file, then they hang for a while and eventually get going again. If I run sar (sysstat package) I get essentially the same picture. From a "sar -A 30 4" here are the averages for two minutes. Load average 3+ and >99% idle. Nearly no I/O of any sort; not 0 but very low amounts for two minutes. Average: proc/s Average: 0.12 Average: cswch/s Average: 48.90 Average: CPU %user %nice %system %iowait %idle Average: all 0.29 0.01 0.12 0.07 99.51 Average: 0 0.25 0.01 0.13 0.00 99.60 Average: 1 0.33 0.01 0.11 0.13 99.42 Average: INTR intr/s Average: sum 11.87 Average: pgpgin/s pgpgout/s fault/s majflt/s Average: 0.01 8.55 34.76 0.00 Average: pswpin/s pswpout/s Average: 0.00 0.00 Average: tps rtps wtps bread/s bwrtn/s Average: 1.35 0.00 1.35 0.02 17.11 Average: frmpg/s bufpg/s campg/s Average: -0.73 0.02 -0.02 Average: CPU i000/s i001/s i008/s i009/s i012/s i014/s i015/s i177/s i185/s i193/s i201/s i209/s Average: 0 0.02 0.00 0.00 0.00 0.00 0.00 0.09 7.76 0.00 0.00 0.00 0.00 Average: 1 0.01 2.65 0.00 0.00 0.00 1.35 0.00 0.00 0.00 0.00 0.00 0.00 Average: IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s Average: lo 0.00 0.00 0.12 0.12 0.00 0.00 0.00 Average: eth0 3.35 2.25 364.83 196.51 0.00 0.00 0.00 Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: DEV tps rd_sec/s wr_sec/s Average: dev1-0 0.00 0.00 0.00 Average: dev1-1 0.00 0.00 0.00 Average: dev1-2 0.00 0.00 0.00 Average: dev1-3 0.00 0.00 0.00 Average: dev1-4 0.00 0.00 0.00 Average: dev1-5 0.00 0.00 0.00 Average: dev1-6 0.00 0.00 0.00 Average: dev1-7 0.00 0.00 0.00 Average: dev1-8 0.00 0.00 0.00 Average: dev1-9 0.00 0.00 0.00 Average: dev1-10 0.00 0.00 0.00 Average: dev1-11 0.00 0.00 0.00 Average: dev1-12 0.00 0.00 0.00 Average: dev1-13 0.00 0.00 0.00 Average: dev1-14 0.00 0.00 0.00 Average: dev1-15 0.00 0.00 0.00 Average: dev3-0 1.35 0.02 17.11 Average: dev3-1 0.00 0.00 0.00 Average: dev3-2 0.13 0.02 3.35 Average: dev3-3 0.00 0.00 0.00 Average: dev3-4 0.00 0.00 0.00 Average: dev3-5 1.22 0.01 13.75 Average: dev22-64 0.00 0.00 0.00 Average: dev22-65 0.00 0.00 0.00 Average: dev22-0 0.00 0.00 0.00 Average: dev2-0 0.00 0.00 0.00 Average: dev9-0 0.00 0.00 0.00 Average: kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad Average: 696608 1889792 73.07 193202 1220938 4192956 0 0.00 0 Average: dentunusd file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz Average: 216815 3285 185128 0 0.00 0 0.00 0 0.00 Average: totsck tcpsck udpsck rawsck ip-frag Average: 343 56 8 0 0 Average: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 Average: 14 186 3.35 3.61 3.59 The machine entered this state at about 4:45pm yesterday afternoon. It is now 12:00 noon the next day. The "date" command says that the system thinks that the time is 7:26PM yesterday. In the last 47 minutes the system clock has gained only 6 minutes. A rate of somewhere around 7.8. Another interesting little symptom, when this slowdown is in effect the keyboard autorepeat on keys stops working. If this was the only machine doing this I'd think it was a hardware problem. But (a) it isn't the only machine and (b) while it seems to always happen to these machines, it is only after running for at least a few hours without problems. Gary -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list