On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: Kwolek, Adam > > Sent: Wednesday, February 23, 2011 10:02 AM > > To: 'NeilBrown' > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > > Neubauer, Wojciech > > Subject: RE: [PATCH 0/3] Continue expansion after reboot > > > > > > > > > -----Original Message----- > > > From: NeilBrown [mailto:neilb@xxxxxxx] > > > Sent: Wednesday, February 23, 2011 4:38 AM > > > To: Kwolek, Adam > > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > > > Neubauer, Wojciech > > > Subject: Re: [PATCH 0/3] Continue expansion after reboot > > > > > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@xxxxxxxxx> > > > wrote: > > > > > > > Currently reshaped/expanded array is assembled but it stays in > > > inactive state. > > > > This patches allows for array assembly when array is under > > expansion. > > > > Array with reshape/expansion information in metadata is assembled > > > > and reshape process continues automatically. > > > > > > > > Next step: > > > > Problem is how to address container operation during assembly. > > > > 1. After first array being reshaped, assebly process looks if mdmon > > > > sets migration for other array in container. If yes it continues > > > work > > > > for next array. > > > > > > > > 2. Assembly process performs reshape of currently reshaped array > > only. > > > > Mdmon sets next array for reshape and user triggers manually > > mdadm > > > > to finish container operation with just the same parameters set. > > > > > > > > Reshape finish can be executed for container operation by container > > > re-assembly > > > > also (this works in current code). > > > > > > > > > > Yes, this is an awkward problem. > > > > > > Just to be sure we are thinking about the same thing: > > > When restarting an array in which migration is already underway > > mdadm > > > simply > > > forks and continues monitoring that migration. > > > However if it is an array-wide migration, then when the migration of > > > the > > > first array completes, mdmon will update the metadata on the second > > > array, > > > but it isn't clear how mdadm can be told to start monitoring that > > > array. > > > > > > How about this: > > > the imsm metadata handler should report that an array is 'undergoing > > > migration if it is, or if an earlier array in the container is > > > undergoing a > > > migration which will cause 'this' array to subsequently be migrated > > > too. > > > > > > So if the first array is in the middle of a 4drive->5drive > > conversion > > > and > > > the second array is simply at '4 drives', then imsm reported (to > > > container_content) that the second drive is actually undergoing a > > > migration > > > from 4 to 5 drives, and is at the very beginning. > > > > > > When mdadm assembles that second array it will fork a child to > > monitor > > > it. > > > It will need to somehow wait for mdmon to really update the metadata > > > before > > > it starts. This can probably be handled in the ->manage_reshape > > > function. > > > > > > Something along those line would be the right way to go I think. It > > > avoid > > > any races between arrays being assembled at different times. > > > > > > This looks fine for me. > > > > > > > > > > > > Adam Kwolek (3): > > > > FIX: Assemble device in reshape state with new disks number > > > > > > I don't think this patch is correct. We need to configure the array > > > with the > > > 'old' number of devices first, then 'reshape_array' will also set the > > > 'new' > > > number of devices. > > > What exactly what the problem you were trying to fix? > > > > When array is being assembled with old raid disk number assembly cannot > > set readOnly array state > > (error on sysfs state writing). Array stays in inactive state, so > > nothing (reshape) happened later. > > > > I think that array cannot be assembled with old disks number (added new > > disks are present as spares) > > because begin of array uses new disks already. This means we are > > assembling array with not complete disk set. > > Stripes on begin can be corrupted (not all disks present in array). At > > this point inactive array state is ok to keep safe user data. > > > > > > I'll test is setting old disk number and later configuration change in > > disks number and array state resolves problem. > > I'll let you know results. > > I've made some investigations. I've tried assemble algorithm (as you suggested): > Conditions: > reshape 3 disk raid5 array to 4 disks raid5 array > is interrupted. Restart is invoked by command 'mdadm -As' > > 1. Assemble() builds container with new disks number > 2. Assemble() builds container content (array with /old/ 3 disks) > 3. array is set to frozen to block monitor > 4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process > 5. Continue_reshape() starts reshape process > 6. Continue_reshape() continues reshape process > > Problems I've met: > 1. not all disks in Assembly() are added to array (old disks number limitation) I want to fix this by getting sysfs_set_array to set up the new raid_disks number. It currently doesn't because the number of disks that md is to expect could be different to the number of disks recorded in the metadata, and "analyse_change" might be needed to resolve the difference. A particular example is that the metadata might think a RAID0 is changing from 4 device to 5 devices, but md need to be told that a RAID4 is changing from 5 devices to 6 devices. However in the case, we really need to do the 'analyse_change' before calling sysfs_set_array anyway. So get sysfs_set_array to set up the array fully, and find somewhere appropriate to put a call to analyse_change ... possibly modifying analyse_change a bit ... > 2. setting reshape_position invokes automatically reshape start in md on array run That shouldn't be a problem.. We start the array read-only and the reshape will not start while that is set. So: set 'old' shape of array, set reshape_position set 'new' shape of array start array 'readonly' set sync_max to 0 enable read/write allow reshape to continue while monitoring it with mdadm. Does this work, or is there something I have missed. > 3. setting of reshape position clears delta_disks in md (and other parameters, for now not important) That shouldn't matter ... where do we set reshape_position that it causes a problem? > 4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation) I'm not sure what you are getting at.... reshape continuation is handled by Grow_continue which is passed the handle to the array. It should fork and monitor the array in the background, so it has its own copy of the handle ??? > 5. reshape continuation can require backup file. It depends where it was interrupted during expansion, > Other reshapes can always require backup file Yes ... Why is this a problem? > 6. to run reshape, 'reshape' has to be written to sync_action. > Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value > - it depends on operation grow/shrink) Yes ... raid5_start_reshape isn't used for restarting a reshape. run() will start the reshape thread, which will not run because the array is read-only Once you switch the array to read-write the sync_thread should get woken up and will continue the reshape. I think the remainder of your email is also addressed by what I have said above so I won't try to address specific things. Please let me know if you see any problem with what I have outlined. Thanks! NeilBrown > 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm > As array is started with not all disks (old raid disks), we cannot allow for such check (???) > I've made workaround (setting reshape position clears this flag for external meta) > > I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost. > > I've tested my first idea also. > To do as much as we can, as for native meta (reshape is started by array run). > Some problems are similar as before (p.4, p.5) > The only serious problem, that I've got with this is how to let to know md about delta_disks. > I've resolved it by adding special case in raid_disks_store(), > similar to native metadata when old_disks number is guessed. > For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence. > (as I remember you do not want to expose delta disks in sysfs). > > Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape... > This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now. > > Both methods are not fully workable yet, but I think this will change on Monday. > > Considering above, I still like more method when we construct array with new disks number. > Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays. > > I'm waiting for your comments/questions/ideas. > > BR > Adam > > > > > BR > > Adam > > > > > > > > > > > > imsm: FIX: Report correct array size during reshape > > > > imsm: FIX: initalize reshape progress as it is stored in > > > metatdata > > > > > > > These both look good - I have applied them. Thanks. > > > > > > NeilBrown > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html