Michael Tokarev <mjt@xxxxxxxxxx> wrote: > > mdadm -Gb internal --bitmap-chunk=1024 /dev/md4 > > mdadm /dev/md4 -r /dev/sdh1 > > mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1 > > mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing > > mdadm /dev/md4 --re-add /dev/md5 > > mdadm /dev/md5 -a /dev/sdh1 > > > > ... wait a few hours for md5 resync... > > And here's the problem. While new disk, sdh1, are resynced from > old, probably failing disk sde1, chances are high that there will > be an unreadable block on sde1. So, we need a way, to feedback the redundancy from the raid5 to the raid1. Here is a short 5 minute brainstorm I did, to check, wether it's possible to manage this, and I think, it is: Requirements: Any Raid with parity of any kind needs to provide so called "vitual block devices", which carry the same data, as the underlaying block devices, which the array is composed of. If the underlaying block device can't read a block, that block will be calculated from the other raid disks and hence is still readable using the virtual block device. e.g. having the disks sda1 .. sde1 in a raid5 means, the raid provides not one new block device (e.g. /dev/md4 as in the example above), but six (the one just mentioned and maybe we call them /dev/vsda1 .. /dev/vsde1 or /dev/mapper/vsda1 .. /dev/mapper/vsde1 or even /dev/mapper/virtual/sda1 .. /dev/mapper/virtual/sde1). For ease, I'll call them just vsdx1 here. Reading any block from vsda1 will yield the same data as reading from sda1 at any time (except the case, that reading from sda1 fails, then vsda1 will still carry that data). Now, construct the following nested raid structure: sda1 + vsda1 + missing = /dev/md10 RAID1 w/o super block sdb1 + vsdb1 + missing = /dev/md11 RAID1 w/o super block sdc1 + vsdc1 + missing = /dev/md12 RAID1 w/o super block sdd1 + vsdd1 + missing = /dev/md13 RAID1 w/o super block sde1 + vsde1 + missing = /dev/md14 RAID1 w/o super block md10 + md11 + md12 + md13 + md14 = /dev/md4 RAID5 optionally with sb Problem: As long as md4 is not active, vsdx1 is not available. So the arrays md1x need to be created with 1 disk out of 3. After md4 was assembled, vsdx1 needs to be added. Now we get another problem: There must be no sync between sdx1 and vsdx1 (they are more or less the same device). So there should be an option to mdadm like --assume-sync for hot-add. What we get: As soon as we decide to replace a disk (like sde1 as above) we just hot-add sdh1 to the sde1-raid1 array. That array will start resyncing. If now a block can't be read from sde1, it's just taken from vsde1 (and there that block will be reconstructed from the raid5). After syncing to sdh1 was completed, sde1 may be removed from the array. We would loose redundancy at no time - the only lost redundancy is those of the already failed sde1 which we can't workaround anyways (except for using raid6 etc.). This is only a brainstorm, and I don't know what internal effects could cause problems, like the resyncing process of the raid1 array reading a bad block from sde1 then triggering a reconstruction using vsde1 if in parallel the raid5 itself detects (e.g. as cause from a user space read) sde1 to have failed and tries to write back that block to the raid array for sde1 while in the raid1 the same rewrite is pending already ... problems over problems, but the evil is in detail as ever ... Regards, Bodo - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html