On Monday May 21, brugolsky@xxxxxxxxxxxxxxxxxxxxxxxxx wrote: > Neil, > > What seems desirable to me is a way to take a new (larger) spare drive and > add it to a RAID1 for a particular RAID 4/5/6 component, and then when > it's sync'd, replace the now redundant small drive with another larger > drive. Wash, rinse, repeat. This way the array is never degraded. > Though I imagine that this particular arrangement doesn't have the > benefit of the stripe rewrite when encountering a latent error on the > drive that is being migrated. [Presumably the failing addresses could > be cycled through the check from userland though, by doing a read above > the stacked RAID.] > > One could start a RAID 4/5/6 array over a degraded RAID1 for each > component, (i.e., a degraded RAID1). > > I haven't been following the metadata changes closely. Is it possible > to do this with external MD metadata? It can also be done with > device-mapper, but dm-mirror is very immature compared to MD RAID1. > > Comments? This doesn't really have anything to do with the metadata used - it is primarily an implementation issue (though you would need to be careful picking up the pieces after a crash). If we could freeze an array (so that all writes block), then we could do much of what you suggest: - freeze the array - remove the target device - create a raid1 of the target and the new - re-add the raid1 - unfreeze the array. The issue of dealing with read errors on the target device is much more awkward to deal with. The approach that seems right to me is: - create a raid1 variant which does a passive resync: When the next-needed block is read or written, write it to the second device and advance the "next-needed" pointer. - Get this raid1 to simply return read errors (which might be OK already) so that a read-error won't be fatal. But a read request that be behind the "next-needed" pointer gets served from the second device if the first does fail. - Implement a 'check-one-disk' operation on raid5 (and others) so that instead of reading all devices, it just reads all through one. If this one is really a raid1-variant, doing that read will effect a resync on the raid1, and any read error will be handled correctly. So it is all quite possible, and I agree that it could be valuable. It just needs someone to do it, and work out all the fine details. Anyone want to try some coding ???? NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html