Frank van Maarseveen wrote:
On Mon, Jan 10, 2005 at 12:16:58AM +0100, maarten wrote:
You cut out my entire idea about leaving the 'failed' disk around to
eventually being able to compensate a further block error on another media.
Why ? It would _solve_ your problem, wouldn't it ?
I did not intend to cut it out but simplified the situation a bit: if
you have all the RAID5 disks even with a bunch of errors spread out over
all of them then yes, you basically still have the data. Nothing is
lost provided there's no double fault and disks are not dead yet. But
there are not many technical people I would trust for recovering from
this situation. And I wouldn't trust myself without a significant
coffee intake either :)
I think read errors are to be handled very differently compared to disk
failures. In particular the affected disk should not be kicked out
incautious. If done so, you waste the real power of the RAID5 system
immediately! As long, as any other part of the disk can still be read,
this data must be preserved by all means. As long as only parts of a disk
(even of different disks) can't be read, it is not a fatal problem, as long
as the data can still be read from an other disk of the array. There is no
reason to kill any disk in advance.
What I'm missing is some improved concept of replacing a disk:
Kicking off some disk at first and starting to resync to a spare
disk thereafter is a very dangerous approach. Instead some "presync"
should be possible: After a decision to replace some disk, the new
(spare) disk should be prepared in advance, while all other disks are still
running. After the spare disk was successfully prepared, the disk to replace
may be disabled.
This sounds a bit like RAID6, but it is much simpler. The complicated part
may be the phase where I have one additional disk. A simple solution would
be to perform a resync offline, while no write takes place. This may even be
performed by a userland utility. If I want to perform the "presync" online,
I have to carry out writes to both disks simultaneously, while the presync
takes place.
Dieter.
--
Dieter Stüken, con terra GmbH, Münster
stueken@xxxxxxxxxxx
http://www.conterra.de/
(0)251-7474-501
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html