> -----Original Message----- > From: NeilBrown [mailto:neilb@xxxxxxx] > Sent: Sunday, February 27, 2011 7:51 AM > To: Kwolek, Adam > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > Neubauer, Wojciech > Subject: Re: [PATCH 0/3] Continue expansion after reboot > > On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam" > <adam.kwolek@xxxxxxxxx> > wrote: > > > > > > > > -----Original Message----- > > > From: Kwolek, Adam > > > Sent: Wednesday, February 23, 2011 10:02 AM > > > To: 'NeilBrown' > > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > > > Neubauer, Wojciech > > > Subject: RE: [PATCH 0/3] Continue expansion after reboot > > > > > > > > > > > > > -----Original Message----- > > > > From: NeilBrown [mailto:neilb@xxxxxxx] > > > > Sent: Wednesday, February 23, 2011 4:38 AM > > > > To: Kwolek, Adam > > > > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; > > > > Neubauer, Wojciech > > > > Subject: Re: [PATCH 0/3] Continue expansion after reboot > > > > > > > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek > <adam.kwolek@xxxxxxxxx> > > > > wrote: > > > > > > > > > Currently reshaped/expanded array is assembled but it stays in > > > > inactive state. > > > > > This patches allows for array assembly when array is under > > > expansion. > > > > > Array with reshape/expansion information in metadata is > assembled > > > > > and reshape process continues automatically. > > > > > > > > > > Next step: > > > > > Problem is how to address container operation during assembly. > > > > > 1. After first array being reshaped, assebly process looks if > mdmon > > > > > sets migration for other array in container. If yes it > continues > > > > work > > > > > for next array. > > > > > > > > > > 2. Assembly process performs reshape of currently reshaped array > > > only. > > > > > Mdmon sets next array for reshape and user triggers manually > > > mdadm > > > > > to finish container operation with just the same parameters > set. > > > > > > > > > > Reshape finish can be executed for container operation by > container > > > > re-assembly > > > > > also (this works in current code). > > > > > > > > > > > > > Yes, this is an awkward problem. > > > > > > > > Just to be sure we are thinking about the same thing: > > > > When restarting an array in which migration is already underway > > > mdadm > > > > simply > > > > forks and continues monitoring that migration. > > > > However if it is an array-wide migration, then when the > migration of > > > > the > > > > first array completes, mdmon will update the metadata on the > second > > > > array, > > > > but it isn't clear how mdadm can be told to start monitoring > that > > > > array. > > > > > > > > How about this: > > > > the imsm metadata handler should report that an array is > 'undergoing > > > > migration if it is, or if an earlier array in the container is > > > > undergoing a > > > > migration which will cause 'this' array to subsequently be > migrated > > > > too. > > > > > > > > So if the first array is in the middle of a 4drive->5drive > > > conversion > > > > and > > > > the second array is simply at '4 drives', then imsm reported (to > > > > container_content) that the second drive is actually undergoing > a > > > > migration > > > > from 4 to 5 drives, and is at the very beginning. > > > > > > > > When mdadm assembles that second array it will fork a child to > > > monitor > > > > it. > > > > It will need to somehow wait for mdmon to really update the > metadata > > > > before > > > > it starts. This can probably be handled in the ->manage_reshape > > > > function. > > > > > > > > Something along those line would be the right way to go I think. > It > > > > avoid > > > > any races between arrays being assembled at different times. > > > > > > > > > This looks fine for me. > > > > > > > > > > > > > > > > Adam Kwolek (3): > > > > > FIX: Assemble device in reshape state with new disks > number > > > > > > > > I don't think this patch is correct. We need to configure the > array > > > > with the > > > > 'old' number of devices first, then 'reshape_array' will also set > the > > > > 'new' > > > > number of devices. > > > > What exactly what the problem you were trying to fix? > > > > > > When array is being assembled with old raid disk number assembly > cannot > > > set readOnly array state > > > (error on sysfs state writing). Array stays in inactive state, so > > > nothing (reshape) happened later. > > > > > > I think that array cannot be assembled with old disks number (added > new > > > disks are present as spares) > > > because begin of array uses new disks already. This means we are > > > assembling array with not complete disk set. > > > Stripes on begin can be corrupted (not all disks present in array). > At > > > this point inactive array state is ok to keep safe user data. > > > > > > > > > I'll test is setting old disk number and later configuration change > in > > > disks number and array state resolves problem. > > > I'll let you know results. > > > > I've made some investigations. I've tried assemble algorithm (as you > suggested): > > Conditions: > > reshape 3 disk raid5 array to 4 disks raid5 array > > is interrupted. Restart is invoked by command 'mdadm -As' > > > > 1. Assemble() builds container with new disks number > > 2. Assemble() builds container content (array with /old/ 3 disks) > > 3. array is set to frozen to block monitor > > 4. sync_max in sysfs is set to 0, to block md until reshape monitoring > takes carry about reshape process > > 5. Continue_reshape() starts reshape process > > 6. Continue_reshape() continues reshape process > > > > Problems I've met: > > 1. not all disks in Assembly() are added to array (old disks number > limitation) > > I want to fix this by getting sysfs_set_array to set up the new > raid_disks > number. > It currently doesn't because the number of disks that md is to expect > could > be different to the number of disks recorded in the metadata, and > "analyse_change" might be needed to resolve the difference. > A particular example is that the metadata might think a RAID0 is > changing from > 4 device to 5 devices, but md need to be told that a RAID4 is changing > from > 5 devices to 6 devices. > However in the case, we really need to do the 'analyse_change' before > calling > sysfs_set_array anyway. > > So get sysfs_set_array to set up the array fully, and find somewhere > appropriate to put a call to analyse_change ... possibly modifying > analyse_change a bit ... In the nearest patches that I hope makes me closer to the final code, I will not use analyze changes. I want to have that makes the thing (restore from checkpoint) workable first. > > > 2. setting reshape_position invokes automatically reshape start in md > on array run > > That shouldn't be a problem.. We start the array read-only and the > reshape > will not start while that is set. > So: > set 'old' shape of array, > set reshape_position > set 'new' shape of array > start array 'readonly' > set sync_max to 0 > enable read/write > allow reshape to continue while monitoring it with mdadm. > > Does this work, or is there something I have missed. Everything in mdadm seems to be ok. One small problem is in md (raid5.c:5052): For grow case there is check for checkpoint. For my code chunk_sectors and new_chunk_sectors are the same, so array is not started. If I ignore '==' case array can be assembled. > > > > 3. setting of reshape position clears delta_disks in md (and other > parameters, for now not important) > > That shouldn't matter ... where do we set reshape_position that it > causes > a problem? Yes I've found it during my tests. > > > 4. Assembly() closes handle to array (it has to be not closed and used > in reshape continuation) > > I'm not sure what you are getting at.... reshape continuation is handled > by > Grow_continue which is passed the handle to the array. It should fork > and > monitor the array in the background, so it has its own copy of the > handle > ??? Assemble_container_content() closes array handle. I've not fork() inside this function, but probably this will be better. > > > 5. reshape continuation can require backup file. It depends where it > was interrupted during expansion, > > Other reshapes can always require backup file > > Yes ... Why is this a problem? ... not a problem, rather TBD ;) Assemble operation has not pointed backup file, so backup file name has to be generated. > > > 6. to run reshape, 'reshape' has to be written to sync_action. > > Raid5_start_reshape() is not prepared for reshape restart (i.e > reshape position can be 0 or max array value > > - it depends on operation grow/shrink) > > Yes ... raid5_start_reshape isn't used for restarting a reshape. > run() will start the reshape thread, which will not run because the > array is > read-only > Once you switch the array to read-write the sync_thread should get woken > up > and will continue the reshape. This is what I wanted to hear :). > > > I think the remainder of your email is also addressed by what I have > said > above so I won't try to address specific things. > > Please let me know if you see any problem with what I have outlined. > > Thanks! > > NeilBrown Everything is more or less clear, I'll prepare few patches that allows For reshape continuation for expansion to get your feedback. BR Adam > > > > > 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot > be started from mdadm > > As array is started with not all disks (old raid disks), we cannot > allow for such check (???) > > I've made workaround (setting reshape position clears this flag > for external meta) > > > > I've started reshape again on /all/ new disks number, but it still > starts from array begin. This is a matter of search where checkpoint is > lost. > > > > I've tested my first idea also. > > To do as much as we can, as for native meta (reshape is started by > array run). > > Some problems are similar as before (p.4, p.5) > > The only serious problem, that I've got with this is how to let to > know md about delta_disks. > > I've resolved it by adding special case in raid_disks_store(), > > similar to native metadata when old_disks number is guessed. > > For external metadata, I am storing old and then new disks numbers, md > calculates delta disks from this raid disks numbers sequence. > > (as I remember you do not want to expose delta disks in sysfs). > > > > Other issue that I'm observing in both methods is sync_action sysfs > entry behavior. It reports reshape->idle->reshape... > > This 'idle' for a very short time causes migration cancelation. I've > made workaround in mdmon for a now. > > > > Both methods are not fully workable yet, but I think this will change > on Monday. > > > > Considering above, I still like more method when we construct array > with new disks number. > > Begin of array /already reshaped/ has all disks present. In the same > way md works for native arrays. > > > > I'm waiting for your comments/questions/ideas. > > > > BR > > Adam > > > > > > > > BR > > > Adam > > > > > > > > > > > > > > > > imsm: FIX: Report correct array size during reshape > > > > > imsm: FIX: initalize reshape progress as it is stored in > > > > metatdata > > > > > > > > > These both look good - I have applied them. Thanks. > > > > > > > > NeilBrown > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html