Re: mdadm reshape stop, resume with alternate/moved backup file?

"NeilBrown" <neilb@xxxxxxx> · Tue, 3 Nov 2009 11:59:18 +1100

On Tue, November 3, 2009 11:44 am, Michael Evans wrote:
> In the past I'd only worked with software raid5, it used the temporary
> file for a brief period at the beginning and then it was all
> disk-bound.
>
> I recently started a raid-6 takeover of one of my larger raid-5
> arrays, it is running around 1/10th to 1/20th the speed I expect:
>
>       2909829120 blocks super 1.1 level 6, 128k chunk, algorithm 18
> [8/7] [UUUUUUUU]
>       [=>...................]  reshape =  7.2% (35359488/484971520)
> finish=11057.6min speed=677K/sec
>
> I suspect this is because another array sharing the same devices is
> where I put the temporary file, and further that it might be waiting
> for complete hardware syncs before proceeding.  If that's the case I
> expect that using a small, currently unused, area on unrelated block
> devices would speed the operation up by at least 5x.
>
> Can I safely pause the current restripe process ( 1060 pts/...    SL
>  21:28 mdadm -G /dev/md52 -l6 --backup-file=/md52 ) with something
> like kill 1060 and then re-invoke it with the backup file in another
> location?  Or would it be this increadiably slow anyway?

1/ It is safe to stop the array, move the backup file, then reassemble
  the array giving it the moved backup file.
2/ This will actually be significantly faster even if the backup file is
  on the same device as there is a bug which causes the size of the
  data being backuped to be very small (and so very slow) when the reshape
  is first started.  When the reshape is restarted this bug does not apply
  and you get the backup performed in larger chunks.
3/ There is another bug where by if one of the devices in the array dies
  during the reshape, the backup process stops working correctly with the
  result that the reshape goes much faster but the backup is completely
  useless.  If you crash during the reshape after a failed device,
  you will probably lose data.  If you try to stop and restart the
  array after one device has failed, the restart will fail.  However
  this is still the safest thing to do.  I will try to put out some
  updates to mdadm so that you can reassemble the array safely in this
  case (and of course, fix the problem so that the backup is maintained
  throughout the entire run).

So yes, go ahead and move the file.  But if you get a device failure
stop the reshape and ask me to help - I'll get something to you within
24 hours - probably less but I have to allow for time zones...

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html