Sebastian Kuzminsky wrote:
Matt Darcy <kernel-lists@xxxxxxxxxxxxxxxxx> wrote:
Its almost as if there is an "IO leak" which is the only way I can think
of to describe it.the card / system performaces quite well as individual
disks, but as soon as its entered into a raid 5 configuration using the
any number of disks the creation of the array appears to be fine until
around %20-%30 through the assembly, the speed of the arrays creations
plummits and the machine hangs.
You have 7x250G disks in Raid-5, so that's 6x250G or 1.5T total space.
In the beginning of raid recovery, when the system is good, you're
getting 12M/s. It slows then dies after 25% to 40% of completion.
6x250G is 1536000M, at 12M/s that's about 35 hours. You tested the
disks individually (without Raid) for ~12 hours, which is about 34%
of 35 hours. So it's possible you'd see the the same slowdown & hang
if you tested the individual disks longer.
You're having these problems on a Marvell controller with 2.6.15 and the
in-kernel sata_mv driver, right? I've got a very similar system with
unexplained hard hangs too. On my system the individual disks seem to
work fine, Raid-6 of the disks seems work fine, LVM of the disks seems
to work fine, but LVM of a Raid-6 of the disks hangs.
One wierd thing I've discovered is that if I enable all the kernel
debugging options, the system is perfectly stable, and all the debug
tests report no warnings or errors to the logs. Seems like a race
condition somewhere, I'm suspecting in the interaction of Raid-6 and
LVM, but it could be anywhere I suppose. I've attached the .config of
the production (non-debug) kernel that hangs, and the diff to the debug
kernel that works.
Just to clarify a few things,
using the 2.6.15 kernel I can use and assemble the raid 5 array without
a problem, however using it lvm2 causes it to hang exactly as you have
mentioned before.
When I first started working this problem through I started using some
of he mm patches with the 2.6.15-rc's which made a good difference, in
that I could build and use the array and even with lvm2 for a period of
time, however there was a few quirky bugs with it, in that it couldn't
maintain the arrays stability, on certain occasions, if I rebooted the
box, most of the disks would be marked as unsuable and the array would
refuse to start until it was rebuilt, to futher progress this I started
using the libata git branch which again made things a "little" better,
until the last 2 git versions where I have this problem with the raid
array not being able to build.
from the results I have, have a gut feeling that this is a driver
issue, simpley due to the different results i get with the different
kernels.
I've been given some good thoughts today (last mail in from Mark Haln
has some good suggestions), so all I can do is run the tests Mark
suggested and report back the results to try to progress this forward,
although Marks tests seem to point to hardware issues, such as heat,
vibration etc I still believe this lies at a software driver level, but
its worth running the tests to see what additional data I can get, and
to prove/disprove Marks suggestoins.
I shall report back later
thanks,
Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html