raid1, questions and/or ideas for improvement

Markus Hochholdinger <Markus@xxxxxxxxxxxxxxxxx> · Sat, 26 Jan 2008 15:18:56 +0100

Hi,

i'm just trying to optimize my raid1 setups. And i found out that a lot of my 
assumption are already realised inside the md raid1.

So there is "--bitmap=" option which really improve raid1 resyncs. Great. But 
my Problem is, if i've baked a raid1 out of logical volumes and i want to 
grow the whole thing. Problem is, that md doesn't realize a resized block 
device while in use. So growing lvs and after that growing the raid1 while 
the raid1 is active and clean doesn't work (you also _destroy_ your raid1 
super block resp. raid1 can't find the super block after the grow of the 
lvs). So i've to remove one block device, resize it with lvm and reinsert it. 
But now, the super block can't be found now, so full resync will happen. I 
tried this with version 0.90, 1.1 and 1.2 of super block formats. Neither 
worked.
So i'm wondering if there is some option to "preserve" the super block and 
copy it back after the resize of the logical volume? (OK, i could dd the 
superblock, but is this save?)
Another solution would be to tell md raid1 that it should reread the block 
device sizes, so md can manage by himself to move the super block to the 
correct position? Is there a possibility to tell md raid1 to reread block 
device sizes of the unerlaying block devices and moving super blocks 
correctly?
Another solution could be handy for resyncing almost equal disks. If md raid1 
could be told when adding a "new" disk that the new disk is almost equal the 
raid1 could do something like a resync, meaning reading the good and the new 
disk, comparing, and only write to the new disk if blocks are not equal. The 
benefit with this kind of resync is, that normally block devices can faster 
read than write. Is this kind of resync already possible with md raid1?
Another crazy improvement with this feature would be, you could make sparse 
block devices (with dm zero and dm snapshots) and do full resync with this 
method without allocating each byte of the to be synced sparse block device.

There's also "--write-mostly" and "--write-behind=" but i only managed to use 
and set this on creation time. Is there a possibility to set "--write-mostly" 
and "--write-behind=" while a md raid1 is onine and active?
With this feature i could do dm snapshot of one device of the raid1, set this 
to "--write-mostly" and "--write-behind=" in the raid1 and make a backup of 
the snapshot. So while doing the backup, the performance of the raid1 would 
be better because dm snapshots are not that fast. And if the block devices of 
the raid1 are on different SANs, i can do the backups alternating an SAN1 and 
SAN2 with previously setting "--write-mostly" and "--write-behind=" on the 
device which will be backuped.

Speaking of SANs, a configurable timeout in md raid1 would be great ;-) , but 
as i see on this maling list, there seems to be not much effort in this 
direction. So I only want to say, me too.

It would be nice if someone could answer my questions and/or comment my ideas. 
Perhaps all is already there and i didn't find it or there are preventions i 
didn't recognize.

PS: For the ones who want to know my setup due to i came to this ideas: I've 
build Xen-clusters out of standard hardware. Two (or more) storage servers 
which take all their disks to a (striping) logical volume group. Exporting 
logical volumes over separated (SAN1 and SAN1) networks to the Xen-hosts. 
Inside a Xen-guest I do the raid1, so also the connection of one block device 
between Xen-host and Xen-guest can break without makeing me problems. I do 
online growing the filesystem (harddisks and raid1) inside the Xen-guest. 
Actually i've to remove one disk from the raid1, grow it with lvm and 
reinsert it which results in a full resync. Than the same with the second 
disk. After that i can do a --grow on the raid1 and then do resize2fs. This 
works perfectly, but it costs a lot of resources (twice a full resync) which 
i will lower.
Another solution of the growing would be to use sparse block devices (with 
device mapper you make a zero block device of great size and than snapshot it 
with real hard disk space). With this solution I can make a big filesystem on 
the snapshot device and hard disk blocks will only be allocated as they are 
written. Of course i've to watch and grow the snapshot size, but this can be 
automated. I've also a problem if the real disk space is consumed and this 
has to be prevented. But I like this solution because i don't have to 
manually grow the block devices and filesystems. And this solution would 
totally break if a full resync would be done on a sparse block device, you 
see? So the thing with a "read-mostly resync" would be great for this.

-- 
greetings

eMHa
Attachment:
pgpHzkbIr0AcQ.pgp

Description: PGP signature