hello, i have this problem i have one server connected via two fc cards to two different storages. the storages are intelligent so they handle raid internally and allow mapping of one or more internal raid sets to different luns seen by the host. I use md to mirror between the two storages for disaster recovery purposes. The problem is that after an unclean shutdown (we had a big power outage this weekend) all luns are resynced in parallel, thus bringing the server and both storages down to their knees.
Having a look at match_mddev_units() in md.c (kernel 2.4) it seems to me that raid code uses device major/minor number to determine if two md devices are on the same underlying physical device.
from dev_unit(): mask = ~((1 << hd->minor_shift) - 1); return MKDEV(MAJOR(dev), MINOR(dev) & mask);
in my case the logical drives are seen as different scsi devices by sd layer, so all devices would appear to be on different disks, hence the parallel resync effect.
I can change /proc/sys/dev/raid/speed_limit_max to a lower value to make the server suffer less, but this won't stop the head trashing effect on the storage.
Is there any way of having raid code use a different method for deciding which devices are on the same physical device. i.e checking on which scsi channel they appear.
I can, if i am short of option change match_mddev_units() to use a different match_dev_unit() which uses a different dev_unit() which only checks major, but i would have to hardcode a lot of stuff because sd uses different major numbers (and i am thinking only of sd driver), or i could add tunable with a kernel or module parameter that changes the behaviour of md_do_sync()
smth like
recheck:
serialize = 0;
ITERATE_MDDEV(mddev2,tmp) {
if (mddev2 == mddev)
continue;
+ if (force_serialize) {
+ if (mddev2->curr_resync) {
+ printk(KERN_INFO "md: delaying resync of md%d until md%d "
+ "has finished resync (force_serialize=1)\n",
+ mdidx(mddev), mdidx(mddev2));
+ serialize = 1;
+ break;
+ }
+ } else if (mddev2->curr_resync && match_mddev_units(mddev,mddev2)) {
printk(KERN_INFO "md: delaying resync of md%d until md%d "
"has finished resync (they share one or more physical units)\n",
mdidx(mddev), mdidx(mddev2));
serialize = 1;
break;
}
}
An other idea could be storing a container indicator in the md superblock that can be initialized by mdadm.
comments?
L.
-- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html