On Thursday January 29, Paul.Clements@SteelEye.com wrote: > Description > =========== > This patch provides the md driver with the ability to track > resync/rebuild progress with a bitmap. It also gives the raid1 driver > the ability to perform asynchronous writes (i.e., writes are > acknowledged before they actually reach the secondary disk). The bitmap > and asynchronous write capabilities are primarily useful when raid1 is > employed in data replication (e.g., with a remote disk served over nbd > as the secondary device). However, the bitmap is also useful for > reducing resync/rebuild times with ordinary (local) raid1, raid5, and > raid6 arrays. > Thanks for this. There is obviously some good functionality here, but I do have a few issues with the patch and the code that I would like to see addressed first. - The patch seems to add several separate, though related, pieces of functionality. It would be a lot easier to review if they were separate. There is (at least): - "intent-loging" in the bitmaps. - optimising resync using the intents. - async writes to non-primary devices - 'hotrepair' - bitmap storage. I gather these are stored on a separate device (in a separate filesystem). It would be nice to be able to store them on the raid array directly so the optimised-resync works on a root-filesytem device as well. There is nearly 64k after the superblock that can be used. I think there should be an option to store the bitmap on all the (primary) devices after the superblock. (This could be added later - a separate patch) - A question - can the bitmap be "rescaled" online, or only offline? (not that I particularly want on-line rescaling) How is it achieved? - ioctl - By changing the size of mdu_array_info_t you have changed the GET_ARRAY_INFO ioctl, so you lose backwards compatibility with user-space tools. If you want to return more info with GET_ARRAY_INFO (which is reasonable) we need to handle both the old and new mdu_array_info_t sizes. Ditto for SET_ARRAY_INFO - superblock. If you have an array with a bitmap attached, boot an old kernel that doesn't understand bitmaps, use the array for a while (after a full sync) have an unclean shutdown, and then boot back into the new kernel, the out-of-date bitmap will be used to optimise resync - incorrectly. Is this possibility handled at all? - I don't see the point of the new "array_size" personality method. Surely the size of the bitmap depends on the device size (mddev->size), rather than the array_size - after all, that is what md_do_sync iterates over. - I don't think it is right for "stop" to return EBUSY if there are outstanding async writes. It should either wait for the writes to complete, or abort them (or both, with a timeout). - Write requests tend to come in bursts. So when you get a burst of write requests, it would be most efficient to set all the on-disk bits for all requests in the burst, flush all blocks which have changed and then release the burst of write requests. If I read the code correctly, you do one bitmap-block-write for every bit that gets set. - returning -ENOMEM in the make_request function is a no-no. You need to either cope with a kmalloc failure, or use a mempool or similar to guarantee some memory either now or soon. - The bitmap_page structure is amusing. On memory failure, you highjack the "*map" pointer to form 2 16bit counters, and spend a whole 32bit "unsigned int" to flag if this is the case or not. Would it be better to use one bit of "counter" as the highjack flag, and thereby save a substantial fraction of the size of 'bitmap_page'? - What is the relationship between the 'bitmap_fd' field in mdu_array_info_s, and the SET_BITMAP_FILE ioctl? They both seem to set the bitmap file. Finally, do you have any measurements of the performance impact of the intent logging on various workloads? That'll do for now. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html