Re: Read errors on raid5 ignored, array still clean .. then disaster !!

Luca Berra <bluca@xxxxxxxxxx> · Sat, 30 Jan 2010 08:54:37 +0100

On Fri, Jan 29, 2010 at 09:48:52PM +1100, Neil Brown wrote:
On Wed, 27 Jan 2010 08:41:38 +0100
Luca Berra <bluca@xxxxxxxxxx> wrote:

On Tue, Jan 26, 2010 at 11:28:03PM +0100, Giovanni Tessore wrote:
> Is this some kind of bug?  
No

I'm not sure I agree.
If a device is generating lots of read errors, we really should do something
proactive about that.
If there is a hot spare, then building onto that while keeping the original
active (yes, still on the todo list) would be a good thing to do.

v1.x metadata allows the number of corrected errors to be recorded across
restarts so a real long-term value can be used as a trigger.
uhm, should we use an absolute value here, or should we consider the
ratio of read errors over time. Or both?
the former would indicate a disk that is degrading slowly over time
the latter migh be a symptom of a disk that will die very soon.
we also need to control the threshold on a per device base via sysfs
(eg mdX/md/dev-FOO/maximum_tolerated_read_errors)

So there certainly are useful improvements that could be made here.
I don't deny that, but i would not define as bugs features that are not
yet designed/implemented.

L.

--
Luca Berra -- bluca@xxxxxxxxxx
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html