On Fri, Feb 19 2016, Jes Sorensen wrote: > NeilBrown <neilb@xxxxxxxx> writes: >> A recent commit removed a call to abort_reshape() when IMSM reshape >> completed. An unanticipated result of this is that the suspended >> region is not cleared as it should be. >> So after a reshape, a region of the array will cause all IO to block. >> >> Re-instate the required updates to suspend_{lo,hi} coped from >> abort_reshape(). >> >> This is caught (sometimes) by the test suite. >> >> Also fix a couple of typos found while exploring the code. >> >> Reported-by: Ken Moffat <zarniwhoop@xxxxxxxxxxxx> >> Cc: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx> >> Fixes: 2139b03c2080 ("imsm: don't call abort_reshape() in imsm_manage_reshape()") >> Signed-off-by: NeilBrown <neilb@xxxxxxxx> >> --- >> super-intel.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/super-intel.c b/super-intel.c >> index 90b7b6dee5d0..80b48d0fdd47 100644 >> --- a/super-intel.c >> +++ b/super-intel.c >> @@ -10465,7 +10465,7 @@ int check_degradation_change(struct mdinfo *info, >> * Function: imsm_manage_reshape >> * Description: Function finds array under reshape and it manages reshape >> * process. It creates stripes backups (if required) and sets >> - * checheckpoits. >> + * checkpoints. >> * Parameters: >> * afd : Backup handle (nattive) - not used >> * sra : general array info >> @@ -10595,7 +10595,7 @@ static int imsm_manage_reshape( >> >> start = current_position * 512; >> >> - /* allign reading start to old geometry */ >> + /* align reading start to old geometry */ >> start_buf_shift = start % old_data_stripe_length; >> start_src = start - start_buf_shift; >> >> @@ -10700,6 +10700,9 @@ static int imsm_manage_reshape( >> ret_val = 1; >> abort: >> free(buf); >> + sysfs_set_num(sra, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL); >> + sysfs_set_num(sra, NULL, "suspend_hi", 0); >> + sysfs_set_num(sra, NULL, "suspend_lo", 0); >> >> return ret_val; >> } > > This does indeed match the behavior of abort_reshape(), however looking > through git history, I cannot find any explanation as to why the code > sets suspend_lo twice. > > Any chance you can enlighten me why this is necessary? This is what I never got when I was maintainer - people insisting (rightly) that I explain the details... Thanks! Prior to Commit: 23ddff3792f6 ("md: allow suspend_lo and suspend_hi to decrease as well as increase.") you could only increase suspend_{lo,hi} unless the region they covered was empty. So to reset to 0, you need to push suspend_lo up past suspend_hi first. So to maximize the chance of mdadm working on all kernels, we want to keep doing that. Maybe you could add that to the change comment? Or should there be a comment in abort_reshape()?? Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature