On Fri, Apr 07, 2006 at 12:17:30PM +0200, George Magklaras wrote: > Seeing init in S mode in 'top' like that: > 1 root 16 0 1972 556 480 S 0.0 0.0 0:00.53 init > > is not so extraordinary if you just invoke 'top'. If it is in R or other > process mode continuously, that would be alarming. init stays in 'S' mode for the duration of top. > > >Another symptom that comes along with this weird non-0.00 load issue is > >that > >user I/O seems to "glitch" every now and then. Almost like the hard drives > >are spinning up after being put to sleep... however, APM is disabled in my > >kernel since I am running in SMP mode. > > I think that #might# be the key symptom. How exactly do you mean the > 'glitch'. Does I/O pause for an interval to the point where you notice > it for several seconds and then continues, abort completely (I/O > errors)? It could be that there is somekind of background reconstruction > or syncing happenning due to driver or hardware issues. Yes. This is exactly the behavior I'm experiencing. Everything just pauses then within 2-5 seconds control returns. > dmesg | grep -i md > > should give you any hickups related to the RAID config. Doing also a > 'vmstat 3' Nothing interesting really in the dmesg output, but vmstat shows a lot of interrupts: On DL140G2 w/ SATA software RAID1: [root@localhost oracle]# vmstat 3 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 89340 46236 1828624 0 0 1 29 185 17 0 0 98 1 0 0 0 89340 46236 1828624 0 0 0 11 1014 17 0 0 100 1 0 0 0 89276 46236 1828624 0 0 0 28 1017 25 0 0 99 1 0 0 0 89276 46236 1828624 0 0 0 11 1014 19 0 0 100 1 0 0 0 89276 46236 1828624 0 0 0 21 1016 24 0 0 99 1 0 0 0 89276 46236 1828624 0 0 0 11 1014 19 0 0 99 1 0 0 0 89276 46236 1828624 0 0 0 20 1016 24 0 0 99 1 0 0 0 89276 46236 1828624 0 0 0 11 1013 19 0 0 99 1 On DL140G1 w/ IDE software RAID1 (this box is actually in production so is "busier" than the box above) [root@billmax root]# vmstat 3 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 24604 18508 127772 507604 0 0 1 0 1 0 0 0 0 1 0 0 24604 18508 127772 507604 0 0 0 0 113 24 0 0 100 0 0 0 24604 18508 127772 507604 0 0 0 16 115 29 0 0 100 0 0 0 24604 18508 127772 507604 0 0 0 111 164 52 0 0 96 3 0 0 24604 18508 127772 507604 0 0 0 0 121 33 0 0 100 0 0 0 24604 18508 127772 507608 0 0 1 7 116 48 0 0 100 0 0 0 24604 18508 127772 507620 0 0 3 0 113 38 0 0 100 0 0 0 24604 18508 127772 507620 0 0 0 51 131 26 0 0 100 0 0 0 24604 18508 127772 507620 0 0 0 0 113 34 0 0 100 0 > /proc/interrupts, the output of 'lsmod' and your SoftRAID configs files > would help, as well as your kernel version. > Kernel is 2.6.9-22.ELsmp. [root@localhost oracle]# cat /proc/interrupts CPU0 CPU1 0: 33071575 33118497 IO-APIC-edge timer 1: 28 58 IO-APIC-edge i8042 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 14: 79946 81927 IO-APIC-edge libata 15: 81059 80767 IO-APIC-edge libata 169: 1037048 132 IO-APIC-level uhci_hcd, eth0 177: 0 0 IO-APIC-level uhci_hcd 185: 0 0 IO-APIC-level ehci_hcd NMI: 0 0 LOC: 66192663 66192736 ERR: 0 MIS: 0 RAID configuration -- it doesn't appear that /etc/raidtab gets generated any longer. Here is /etc/mdadm.conf: DEVICE partitions MAILADDR root ARRAY /dev/md0 super-minor=0 ARRAY /dev/md1 super-minor=1 Some output from dmesg: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0x1470 irq 14 ata2: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x1478 irq 15 md: raid1 personality registered as nr 3 md: Autodetecting RAID arrays. md: autorun ... md: considering sdb3 ... md: adding sdb3 ... md: sdb1 has different UUID to sdb3 md: adding sda3 ... md: sda1 has different UUID to sdb3 md: created md0 md: bind<sda3> md: bind<sdb3> md: running: <sdb3><sda3> raid1: raid set md0 active with 2 out of 2 mirrors md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: created md1 md: bind<sda1> md: bind<sdb1> md: running: <sdb1><sda1> raid1: raid set md1 active with 2 out of 2 mirrors md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. EXT3 FS on md0, internal journal EXT3 FS on md1, internal journal [root@localhost oracle]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] md0 : active raid1 sdb3[1] sda3[0] 76991424 blocks [2/2] [UU] unused devices: <none> Here are also some sar statistics: [root@localhost oracle]# sar 12:00:01 AM CPU %user %nice %system %iowait %idle 08:00:01 AM all 0.00 0.00 0.01 0.92 99.06 08:10:01 AM all 0.15 0.00 0.02 0.98 98.85 08:20:01 AM all 0.01 0.00 0.01 0.95 99.03 08:30:01 AM all 0.01 0.00 0.01 0.95 99.03 08:40:01 AM all 0.00 0.00 0.01 1.05 98.94 08:50:01 AM all 0.01 0.00 0.01 0.95 99.03 09:00:01 AM all 0.02 0.00 0.02 0.95 99.00 09:10:01 AM all 0.16 0.00 0.03 0.98 98.83 09:20:01 AM all 0.01 0.00 0.01 0.96 99.02 Average: all 0.04 0.01 0.03 1.01 98.91 iowait seems noticeably higher than on my DL140G1. [root@localhost oracle]# sar -B Linux 2.6.9-22.ELsmp (localhost.localdomain) 04/07/2006 12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s 12:10:01 AM 0.07 19.70 45.95 0.00 12:20:01 AM 0.00 17.59 10.47 0.00 12:30:01 AM 0.00 17.10 9.02 0.00 12:40:01 AM 0.00 21.03 15.56 0.00 12:50:01 AM 0.00 17.34 15.80 0.00 01:00:01 AM 0.00 17.20 8.97 0.00 01:10:01 AM 0.00 19.50 45.04 0.00 01:20:01 AM 0.00 17.49 9.28 0.00 01:30:01 AM 0.00 17.22 8.94 0.00 01:40:01 AM 0.00 20.27 15.61 0.00 01:50:01 AM 0.00 17.08 9.10 0.00 Not sure if the number of page faults there is unusual or not. The most unusual thing seems to be the number of interrupts going on. I can't seem to call sar -I with an IRQ value of 0, but a watch -n 1 "cat /proc/interrupts" seems to show about 1000 interrupts per second to the IO-APIC-edge timer on the DL140G2 system. On the DL140G1 system, I am only seeing about 100 interrupts per second to the IO-APIC-edge timer. Anyways, I am going to keep playing around with sar and see if anything else stands out. Any suggestions? Ray -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list