Re[2]: resync duration ?

Rainer Fuegenstein <rfu@xxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 14 May 2009 19:03:40 +0200

guys,

It's amazing - it really finished resyncing after about 160 minutes.
cpu load went down to 0.1 (from about 2.70) and there was nearly no
disk activity. looks like it was really that fast and not just
displaying a false estimate. (and yes, the xfs file system on top if
it seems to be intact, at least for now).

RW> Sounds like it could be wrapping around as suggested. Since you are
RW> swapping out all the disks have you thought about stopping the array
RW> and using dd to copy the old disk to the new disk. Then there is no
RW> resync period or degraded array.

I'm not feeling comfortable with dd'ing from/to disks of different
sizes/layouts. and I'd like to avoid long downtimes of the server.
Originally I planned to add 2 2port sata controllers and install the
new disks in parallel (as suggested by chris), but that didn't
work out so well.

anyway, over night I'll backup the most important data to the disk I
just pulled out (and will do so with the other disks as I pull them
out). will replace the second disk tomorrow.

I'll let you know of success or failure as the story progresses.

RW> Ryan

RW> On Thu, May 14, 2009 at 9:58 AM, Bryan Mesich <bryan.mesich@xxxxxxxx> wrote:
>> On Thu, May 14, 2009 at 03:06:47PM +0200, Rainer Fuegenstein wrote:
>>> Hi guys,
>>>
>>> I'm a but confused about the duration of a raid5 resync:
>>>
>>> - occasionally, my server with 4*750 gb sata raid5 crashed (because of
>>> problems with the power supply); after rebooting it took about 10 to 12
>>> hours to resync the raid5 (guess it just re-created some parity
>>> information or however it works internally, but didn't have to copy
>>> any data)
>>>
>>> - right now I replaced one 750GB disk with a 1.5TB disk, but now
>>> resyncing (according to /proc/mdstat) it is supposed to take only 160
>>> minutes ?! although it needs top copy data to a blank disk ?
>>
>> Just a guess...but your problem might be an artifact of a bug
>> that has been recently fixed.  Neil sent out mail on 2009-05-04
>> with the fix I'm thinking about.  Here is an excerpt from his
>> mail:
>>
>> Subject: [md PATCH 4/7] md: tidy up status_resync to handle
>> large arrays.
>>
>> Two problems in status_resync.
>> 1. It still used Kilobytes as the basic block unit, while most
>>   code now uses sectors uniformly.
>> 2. It doesn't allow for the possibility that max_sectors exceeds
>>   the range of "unsigned long".
>>
>> So
>>  - change "max_blocks" to "max_sectors", and store sector numbers
>>   in there and in 'resync'
>>  - Make 'rt' a 'sector_t' so it can temporarily hold the number
>>   of remaining sectors.
>>  - use sector_div rather than normal division.
>>  - change the magic '100' used to preserve precision to '32'.
>>   + making it a power of 2 makes division easier
>>   + it doesn't need to be as large as it was chosen when we
>>   averaged speed over the entire run.  Now we average speed over the
>>   last 30 seconds or so.
>>
>>> is this normal or should I be worried, especially before I pull out
>>> the next 750GB disk and replace it with the next 1.5TB disk ?
>>
>> If your sync only takes 160 minutes...then I'd start to poke
>> around.  If its finishing time is reasonable considering the
>> array size, then I'd continue with the migration.
>>
>>> tnx in advance.
>>
>> Bryan
>>
RW> --
RW> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
RW> the body of a message to majordomo@xxxxxxxxxxxxxxx
RW> More majordomo info at  http://vger.kernel.org/majordomo-info.html

------------------------------------------------------------------------------
Unix gives you just enough rope to hang yourself -- and then a couple of more 
feet, just to be sure.
(Eric Allman)
------------------------------------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html