modern drives _have_ correctable read errors, it is a fact.
So if md kicked drives on read error it is also possible to lose all
data on multiple failures (read errors on more than one drives, or
read-errors when sparing), that could have been recovered.
But if we assume that modern drives behave like this, we should also
assume that radid 5, 4, 10 and 1 with < 3 devices, are intrinsically
vulnerable, and someway 'deprecated', because a read error on
recostruction after a disk failure can likely occur.
Personally I just reshaped the failed array as a 6-disk raid-6.
I'll also reshape another machine which has 3 disks to have 2 arrays, a
raid-1 with 3 devices and a raid-5, the first to be used for most
valuable data.
The new one must at least clearly alert the user that a drive is
getting read errors on raid 1,4,5,10.
Agreed, now let's define 'clearly alert', besides syslog.
I would use the same mechanism of events used now my mdadm, defining new
CorrectedReadError event ... for raid-6 it can be info (or warning when
errors becamo too many,configurable); for other raid levels (the
'vulnerable' ones) the severity should be warning or critical.
--
Cordiali saluti.
Yours faithfully.
Giovanni Tessore
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html