Into a previous post I suggested to let at least the admins to be
conscious of the sistuation:
I think it's also a mess for the image of the whole linux server
community: try to explain to a customer that his robust raid system,
with 6 disks plus 2 hot spares, just died because there were read
errors, which were well kwnown by the system; and that now all his
valuable data are lost!!! That customer may say "What a
server...!!!", kill you, then get a win server by sure!!
Oh, please, stop trolling.
Ok, maybe I'm a bit nervous due to the data loss... touche'
But the problem exists, and it's not only mine: I just see another post
sent today on similar problem. So it's worth discuss on it, imho,
because it may involve many installations.
Suppose you have a single disc: if it gives a read error, you lose some
data and then? Do you keep the disc or do you replace it as soon as
possible? I guess the second. So I would adopt the same policy if the
drive is into a raid array too, moreover as one would excpect from it
the maximun safety. To kick the disk out from the array at the first
read error is not a good choice too, I agree, as the array can still
run, BUT the urgency of replacing the disk is the same as for a faulty
disk, as the array may not survive another disk failure! This should be
clearly exposed to admin.
I already posted a little path for /proc/mdadm.
I'll try to write a little daemon to track /sys/block/mdXX/rdYY/errors.
Giovanni
--
Cordiali saluti.
Yours faithfully.
Giovanni Tessore
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html