Neil Brown wrote:
If a device is generating lots of read errors, we really should do something
proactive about that.
If there is a hot spare, then building onto that while keeping the original
active (yes, still on the todo list) would be a good thing to do.
v1.x metadata allows the number of corrected errors to be recorded across
restarts so a real long-term value can be used as a trigger.
So there certainly are useful improvements that could be made here.
It's exactly my opinion.
To use a hot spare if available seems to me a very good idea.
About the metedata version, I was quite disappointed to see that the
default when creating the array is still the 0.9 (correct me if newer
distros behave differently), which does not persist info about the
corrected read errors.
Into a previous post I suggested to let at least the admins to be
conscious of the sistuation:
- it seems that the max number of read errors allowed is set
statically into raid5.c by "conf->max_nr_stripes = NR_STRIPES;" to
256, eventually let it be configurable by an entry into /sys/block/mdXX
- let /proc/mdstat report clearly how many read errors occurred per
device, if any
- let mdadm be configurable in monitor mode to trigger alerts when the
number of read errors for a device changes or goes > n
- explain clearly in the how-to and other user's documentation what's
the behaviour of the raid towards read errors; after a fast survey
among my colleagues, i have noticed nobody was aware of this, and all
of them were sure that raid had the same behaviour for both write and
read errors!
I wrote a little patch (just 2 lines of code) for drivers/md/md.c in
order to let /proc/mdstat report if a device has read errors, and how many.
So my /proc/mdstat now shows something like:
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 sda1[0] sdb1[1](R:36) sdc1[2]
4192768 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
where /dev/sdb1 has 36 corrected read errors.
This lets me know at a glace the real health status of my array.
As every info needed is available throught
/sys/block/mdXX/md/rdXX/errors, i think it would be not difficult to
manage to implement some monitor as standalone application or into mdadm.
One thing is clear to me, now that I faced a disaster over a 6 disks
raid5 array: it's a *big* *hazard* to have devices which gave read
errors into an array, without having md at least signaling the situation
(throught /proc/mdstat or mdadm or anything else). Resync in case of
another disk failure is likely to fail.
I think it's also a mess for the image of the whole linux server
community: try to explain to a customer that his robust raid system,
with 6 disks plus 2 hot spares, just died because there were read
errors, which were well kwnown by the system; and that now all his
valuable data are lost!!! That customer may say "What a server...!!!",
kill you, then get a win server by sure!!
Someone may argue that the health status of disk should be monitored by
smart monitors... but I disagree, imho md driver must not rely on
external tools, it already has info on read errors and should manage
them to avoid as much risk as possible by itself. Smart monitoring is
surely useful... if installed, supported by hardware, properly
configured... but md should not assume that.
Thanks for your interest.
--
Yours faithfully.
Giovanni Tessore
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html