The forthcoming patches are the second iteration (3rd when including Neil's original drop) of the DM -> MD translation module, "dm-raid". There have been some minor changes to some of the 9 patches I posted last time, so I'm just including all of them along with the new patches I have. MD patch reversals and fixes (these can go upstream now): md-backout-dm-dirty-log.patch md-minor-updates.patch md-fix-null-pointer-deref.patch dm-raid module (some reworking of Neil's original patches): dm-raid-seed-module.patch dm-target-callbacks-and-congestion-fn.patch dm-unplug-callback.patch dm-raid-iterate_devices-and-io_hints.patch dm-raid-suspend-and-resume-fns.patch dm-raid-message-fn.patch New patches for support of separate metadata devices: md-new-param-to-calc_dev_sboffset.patch md-new-param-to_sync_page_io.patch md-separate-meta-and-data-devs.patch dm-raid-allow-metadata-devices.patch md-new-superblock-type.patch md-add-bitmap-support.patch So far, the metadata stuff (superblock and bitmaps) is partially working. The work I'm suffering on right now is the bitmap work. The bitmap is being created and updated, but I don't know why it is not being consulted when the array is activated. For example, if I kill a machine in the middle of write operations, I would expect the bitmap to show that there is some recovery work to be done, but it does not. So, I have yet to figure this out. I've included below a document that describes some of the dm-raid design and some of the additional work that remains. I've also attached a script that can be used to create RAID devices through device-mapper after the kernel patches have been compiled and built. If your going to start testing, the non-persistent metadata cases should be pretty solid. The persistent metadata cases should work except for full bitmap support. brassow ** preliminary design/descriptive doc ** The dm-raid.c code provides a why to access the functionality of MD through device-mapper. This allows us to create RAID4/5/6 (and possibly MD's RAID1) through device-mapper. Some of the difficult things to get straight are translating device-mapper's CTR arguments into the proper MD settings and making sure we are able to access and configure MD's various options (recovery speed, write_back settings, etc). The current proposed dm-raid CTR arguments are: The standard first three device-mapper table arguments, where the target_type field is "raid" <start> <len> raid \ This is followed by the parameters that specify the RAID type and that RAID type's required and optional arguments. <raid_type> <#raid_params> <raid_params> \ The required arguments for each RAID type may be different. Currently, they are ('*' indicates currently unsupported RAID): *raid1 <#parms> <chunk_size> raid4 <#parms> <chunk_size> <rebuild_A> raid5_la <#parms> <chunk_size> <rebuild_A> raid5_ra <#parms> <chunk_size> <rebuild_A> raid5_ls <#parms> <chunk_size> <rebuild_A> raid5_rs <#parms> <chunk_size> <rebuild_A> raid6_zr <#parms> <chunk_size> <rebuild_A> <rebuild_B> raid6_nr <#parms> <chunk_size> <rebuild_A> <rebuild_B> raid6_nc <#parms> <chunk_size> <rebuild_A> <rebuild_B> Chunk size is in sectors and the 'rebuild' arguments are used to specify that a new device has been added to the array and must be rebuilt by parity calculations (or copying if RAID1). The 'rebuild' arguments are specified as an index of the array elements. **FIXME: I'd like to remove the 'rebuild' arguments as required arguments and make them optional - specified as 'rebuild=<dev index>'. Optional arguments include ('*' indicates not implemented): [[no]sync] Force/Prevent RAID initialization *[write_back=<int>] *[daemon_sleep=<int>] *[stripecache=<int>] *[minspeed=<int>] *[maxspeed=<int>] **[rebuild=<idx>] **if moved from required args Finally, we have the devices that compose the RAID array. Each array element is given as a metadata device and data device pair. If there is no metadata device, a '-' is given for the metadata device argument. If a device is known to have failed, a '- -' pair can be specified indicating that there is no data or metadata device available for that position in the array. #raid_devs refers to the number of pairings. <#raid_devs> { <meta_dev1> <dev1> .. <meta_devN> <devN> } When translating the device-mapper CTR arguments to MD settings, there are three arguments that /must/ be set by device-mapper (dm-raid.c) at CTR time. They are: * mddev->recovery_cp: Determines the initialization state of the array. The value determines how far the array has processed the initial recovery. (Initial recovery can be parity calculation for RAID456 or copying drives for RAID1.) * rdev->flags/In_sync: Determines the state of an individual device. If !In_sync, then the device needs to be rebuilt - until then, it is not a useful member of the array. * rdev->recovery_offset: Like mddev->recovery_cp, only for a single device. Note that even if the array has not yet been initialized, the rdev->flags/In_sync bit is still set if the drives are healthy. If the array has not been initialized, you would not want to have a device that is not 'In_sync'. This is because no trustworthy recovery could occur for the device because the array had not yet reached a coherent state. >From dm-raid.c, the CTR arguments that control the above are '[no]sync' and the rebuild parameters. There is also a slight difference in behavior depending on whether metadata devices are specified or not. When there are no metadata devices specified, we won't be able to tell if the array was shutdown cleanly, so we must assume recovery_cp = 0. If there is metadata, we will be able to find out if the array was shutdown cleanly, so we can set 'recovery_cp = MaxSector' and let the settings change if the metadata requires it. Translations when metadata devices are not specified: [ per device setings ] nosync sync rebuild | recovery_cp In_sync recovery_offset ------------------------|------------------------------------- 0 0 0 | 0 1 MaxSector 0 0 1 | 0 0 0 (INVALID) 0 1 0 | 0 1 MaxSector 0 1 1 | 0 0 0 (INVALID) 1 0 0 | MaxSector 1 MaxSector 1 0 1 | MaxSector 0 0 Translations when metadata devices are specified: [ per device setings ] nosync sync rebuild | recovery_cp In_sync recovery_offset ------------------------|------------------------------------- 0 0 0 | MaxSector 1 MaxSector 0 0 1 | MaxSector 0 0 0 1 0 | 0 1 MaxSector 0 1 1 | 0 0 0 (INVALID) 1 0 0 | MaxSector 1 MaxSector 1 0 1 | MaxSector 0 0
Attachment:
gime_raid.pl
Description: Perl program
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel