Dear Neil,
The point that I'm trying to make is, that there does exist a specific
case, in which recovery is possible, and that implementing recovery for
that case will not hurt in any way.
Assuming that it true (maybe hpa got it wrong) what specific
conditions would lead to one drive having corrupt data, and would
correcting it on an occasional 'repair' pass be an appropriate
response?
The use case for the proposed 'repair' would be occasional,
low-frequency corruption, for which many sources can be imagined:
Any piece of hardware has a certain failure rate, which may depend on
things like age, temperature, stability of operating voltage, cosmic
rays, etc. but also on variations in the production process. Therefore,
hardware may suffer from infrequent glitches, which are seldom enough,
to be impossible to trace back to a particular piece of equipment. It
would be nice to recover gracefully from that.
Kernel bugs or just plain administrator mistakes are another thing.
But also the case of power-loss during writing that you have mentioned
could profit from that 'repair': With heterogeneous hardware, blocks
may be written in unpredictable order, so that in more cases graceful
recovery would be possible with 'repair' compared to just recalculating
parity.
Does the value justify the cost of extra code complexity?
In the case of protecting data integrity, I'd say 'yes'.
Everything costs extra. Code uses bytes of memory, requires
maintenance, and possibly introduced new bugs.
Of course, you are right. However, in my other email, I tried to sketch
a piece of code which is very lean as it makes use of functions which I
assume to exist. (Sorry, I didn't look at the md code, yet, so please
correct me if I'm wrong.) Therefore I assume the costs in memory,
maintenance and bugs to be rather low.
Kind regards,
Thiemo
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html