Re: emergency call for help: raid5 fallen apart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



John Robinson wrote:
On 25/02/2010 08:05, Giovanni Tessore wrote:
[...]
I see this is the 4th time in a month that poeple reports problem on raid5 due to the read errors during reconstruction; it looks like the 'corrected read errors' policy is quite a real concern.

If you mean md's policy of reconstructing from the other discs and rewriting when there's a read error from one disc of an array, rather than immediately kicking the disc that had a read error, I think you're wrong - I think md is saving lots of users from hitting problems, by keeping their arrays up and running, and giving their discs a chance to remap bad sectors, instead of forcing the user to do full-disc reconstructions more often which will make them more likely to hit read errors during recovery.

I do think we urgently need the hot reconstruction/recovery feature, so failing drives can be recovered to fresh drives with two sources of data, i.e. both the failing drive and the remaining drives in the array, giving us two chances of recovering every sector.

Ideally, there would be a way to avoid kicking any failing drive, or even trying to rewrite the unreadable sector. Some md utility which would clone a drive using logic similar to this:
- start with array assembled but not started
- read a sector from the source drive
  reconstruct t if source fails
  report errors and keep going
- write any recovered sector to the destination
- optionally read it back to be sure it worked, rewrite and note errors
to be useful it must flush to the platter and reread. Yes, it will be slow.

Don't try to be smart, try to make a usable copy of a drive!

I think in case a sector can't be recovered a fixed pattern should be written to the destination, for ease of identification if nothing else.

I think being able to specify MBR or a partition would be useful, that would let critical things be saved faster and with less work. This also open up possibilities for migration of several kinds.

This really should be a command in mdadm! Why? Because it is vital that changes on how mdadm does things are tracked in this tool. Because when you are down to trying this you don't want to be looking for matching versions, etc.

--
Bill Davidsen <davidsen@xxxxxxx>
 "We can't solve today's problems by using the same thinking we
  used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux