I'll compress this down to an even more abstract summary ... "Peter T. Breuer wrote:" > "Peter T. Breuer wrote:" > > Have a look at the patch in the .tgz. I tried to make it as clean as I > > could. Every change I made in the md.c code is commented. There are 4 > > "hunks" of changes to md.c, to allow hotadd after setfaulty, and about > > ten significant hunks of changes to raid1.c, inserting the extra > > technology. There is some extra debugging code in that, which I can > > In fact - I'll publish and go through the patch here. Here we go. 1) change hotadd function in md.c with the objective of permitting hotadd after setfaulty ("hotrepair") which should preserve a bitmap which has been previously added to the disk metadata in the main array (during setfaulty). 2) change write code in raid1.c to mark the bitmap of every mirror component disk which is marked not operational, if it has a bitmap. 3) change mark_disk_bad code in raid1.c to add a bitmap to the disk metadata in the full raid array. This is called by setfaulty, and also on error from below, I think. 4) at the point where a spare disk is marked active in the diskop function in raid1.c (state SPARE_ACTIVE), remove any bitmap associated with the disk metadata in the full raid array. This is called after a successful resync, somehow, and possibly on other occasions. 5) in the resync function in raid1.c, for each resync block or blocks, find all the spare mirror components which are marked nonoperational but writable ("write_only"), and if they have a bitmap and it is clean for the blocks we are interested in, then cheat for that device - report and account for having written to it when it fact we have not. This means calling md_sync_acct and sync_request_done and md_done_sync and possibly signalling on the wait_ready wait queue for the raid device. If we don't cheat then fall through and do the normal thing, which is to launch a write request for some blocks, do a bit of accounting and leave the done functions and signalling for its end_io. I would be deeply obliged if somebody could indicate to me where to make some further changes. What I want to do is allow an underlying block device to notify the raid code when the block device has "fixed itself". My plan is to a) get the raid code to signal the underlying block device during a hotadd, presumably at the end, what the major and minor of the raid device it has become part of is. This will be via an extra ioctl which I will declare for all block devices. Possibly it would be nice to actually pass the file system inode for the special device node of md0 or whatever, if we have it. b) when the block device feels well again, then it will signal the raid code via the inode or more directly via the block_ops array and a new ioctl that it has come back up, and the raid code will then do a hotadd. and I would like pointers as to where to insert this in the current raid codes. Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html