Re: Read errors on raid5 ignored, array still clean .. then disaster !!

Giovanni Tessore <giotex@xxxxxxxxxx> · Mon, 01 Feb 2010 16:51:51 +0100

modern drives _have_ correctable read errors, it is a fact.
So if md kicked drives on read error it is also possible to lose all
data on multiple failures (read errors on more than one drives, or
read-errors when sparing), that could have been recovered.

But if we assume that modern drives behave like this, we should also
assume that radid 5, 4, 10 and 1 with < 3 devices, are intrinsically
vulnerable, and someway 'deprecated', because a read error on
recostruction after a disk failure can likely occur.

Personally I just reshaped the failed array as a 6-disk raid-6.
I'll also reshape another machine which has 3 disks to have 2 arrays, a
raid-1 with 3 devices and a raid-5, the first to be used for most
valuable data.

The new one must at least clearly alert the user that a drive is 
getting read errors on raid 1,4,5,10.
Agreed, now let's define 'clearly alert', besides syslog.

I would use the same mechanism of events used now my mdadm, defining new
CorrectedReadError event ... for raid-6 it can be info  (or warning when
errors becamo too many,configurable); for other raid levels (the
'vulnerable' ones) the severity should be warning or critical.

--
Cordiali saluti.
Yours faithfully.

Giovanni Tessore

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html