On Fri, Jan 13 2006, Matt Darcy wrote: > Matt Darcy wrote: > > > > >> > >> > >>I can now provide further updates for this, although this are not > >>really super useful. > >> > >>I've copied the linux-raid list in as well, as after a little more > >>testing on my part I'd appriciate some input from the raid guys also. > >> > >>First of all, please ignore the comments above, there was a problem > >>with grub and it actually "failed back" and booted into the older git > >>release, so my initial test was actually done running the wrong kenel > >>which I didn't notice. Appologies to all for this. > >> > >>Last nights tests where done using the correct kernel (I fixed the > >>grub typo) 2.6.15-g5367f2d6 > >> > >>The details I have are as follows. > >> > >>I can run the machine accessing the 7 maxtor SATA disks as individual > >>disks for around 12 hours now, without any hangs or errors or any > >>real problems. I've not hit them very hard, but initial performance > >>seems fine and more than usable. > >> > >>The actual problems occurr when including these disks in a raid group. > >> > >>root@berger:~# fdisk -l /dev/sdc > >> > >>Disk /dev/sdc: 251.0 GB, 251000193024 bytes > >>255 heads, 63 sectors/track, 30515 cylinders > >>Units = cylinders of 16065 * 512 = 8225280 bytes > >> > >> Device Boot Start End Blocks Id System > >>/dev/sdc1 1 30515 245111706 fd Linux raid > >>autodetect > >> > >>root@berger:~# fdisk -l /dev/sde > >> > >>Disk /dev/sde: 251.0 GB, 251000193024 bytes > >>255 heads, 63 sectors/track, 30515 cylinders > >>Units = cylinders of 16065 * 512 = 8225280 bytes > >> > >> Device Boot Start End Blocks Id System > >>/dev/sde1 1 30515 245111706 fd Linux raid > >>autodetect > >> > >> > >>As you can see from my two random disks examples, they are > >>partitioned and makred as raid auto detect. > >> > >>I issue the mdadm command to build the raid 5 array > >> > >>mdadm -C /dev/md6 -l5 -n6 -x1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 > >>/dev/sdg1 /dev/sdh1 /dev/sdi1 > >> > >>and the array starts to build....... > >> > >>md6 : active raid5 sdh1[7] sdi1[6](S) sdg1[4] sdf1[3] sde1[2] sdd1[1] > >>sdc1[0] > >> 1225558080 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_] > >> [>....................] recovery = 0.1% (374272/245111616) > >>finish=337.8min speed=12073K/sec > >> > >> > >>however at around %25 - %40 completion the box will simpley just hang > >>- I'm getting no on screen messages and the sylog is not reporting > >>anything. > >> > >>SysRQ is unusable. > >> > >>I'm open to options on how to resolve this and move the driver > >>forward (assuming it is the drivers interfaction with the raid sub > >>system) > >>or > >>how to get some meaningful debug out to report back to the > >>appropriate development groups. > >> > >>thanks. > >> > >>Matt. > >> > >> > >> > >Further further information > > > >The speed that the raid array is being built att appears to drop as > >the array is created > > > >[=====>...............] recovery = 29.2% (71633360/245111616) > >finish=235.1min speed=12296K/sec > >[=====>...............] recovery = 29.3% (71874512/245111616) > >finish=235.2min speed=12269K/sec > >[=====>...............] recovery = 29.4% (72115872/245111616) > >finish=236.0min speed=12209K/sec > >[=====>...............] recovery = 29.7% (72839648/245111616) > >finish=237.4min speed=12091K/sec > >[=====>...............] recovery = 29.8% (73078560/245111616) > >finish=238.6min speed=12010K/sec > >[=====>...............] recovery = 29.8% (73139424/245111616) > >finish=350.5min speed=8176K/sec > >[=====>...............] recovery = 29.8% (73139424/245111616) > >finish=499.6min speed=5735K/sec > >[=====>...............] recovery = 29.8% (73139776/245111616) > >finish=691.0min speed=4147K/sec > > > >Now the box is hung > > > >I didn't notice this until about %20 through the creation of the array > >then I started paying attention to this. These snap shots are taken > >every 30 seconds > > > >So the problem appears to sap bandwidth on the card to the point there > >the box hangs. > > > >This may have some relevance, or it may not, but worth mentioning at > >least. > > > >Matt > > > > > > > First - a quick response to John Stoffels comments. > > Both disks and controller on the latest Bios/firmware versions (thanks > for making me point this out) > > I created a much smaller array (3 disks 1 spare) today and again around > %35 through the creation of the array the whole machine hung, no warning > no errors no logging. > The speed parameter from /proc/mdstat stayed constant to around %30 > (which explained why I perhaps didn't notice this earlier) and like the > creation of the large raid 5 array took a massive nose dive in speed > over about 180 seconds to the point where the box hung. > > Its almost as if there is an "IO leak" which is the only way I can think > of to describe it.the card / system performaces quite well as individual > disks, but as soon as its entered into a raid 5 configuration using the > any number of disks the creation of the array appears to be fine until > around %20-%30 through the assembly, the speed of the arrays creations > plummits and the machine hangs. > > I'm not too sure how to take this further as I get no warnings (other > than the arrays creation time slowing) - I can't use any tools like > netdump or sysRQ. > > I'll try some additional raid tests (such as raid0 or raid1 across more > disks) to see how that works. But as it stands I'm not sure how to get > additional information. You could try and monitor /proc/meminfo and /proc/slabinfo as the system begins to slow to a crawl, any leaks of io structures should be visible there as well. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html