Re: Hot-replace for RAID5

Patrik Horník <patrik@xxxxxx> · Sun, 13 May 2012 23:41:35 +0200

Hi Neil,

I decided to move backup file on other device. I stopped the array,
mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
does it exactly mean? I dont want to proceed until I am sure it does
not signalize error.

I quickly checked sources and it seems to be related to some sysfs
resources, but I am not sure. But the array disappeared from
/sys/block/.

Thanks.

Patrik

On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@xxxxxx> wrote:
> Hi Neil,
>
> On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>>
>>> Neil,
>>
>> Hi Patrik,
>>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>>  is a bit later for that :-(
>
> --layout=preserve is mentioned also in the md or mdadm
> documentation... So is it not the right one?
>
>>>
>>> so I further analyzed the behaviour and I found following:
>>>
>>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>>> of the drives, that drive is utilized almost 80% according to iostat
>>> -x and its avg queue length is almost 4 while having await under 50
>>> ms.
>>>
>>> - The variable speed and low speeds down to 100 KB are caused by
>>> problems on drive I suspected as problematic. Its service time is
>>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>>> tested the read speed on it by running check of array and it worked
>>> with 30 MB/s. And because preserve should only read from it I did not
>>> specifically test its write speed )
>>>
>>> So my questions are:
>>>
>>> - Is there a way I can move backup_file to other drive 100% safely? To
>>> add another non-network drive I need to restart the server. I can boot
>>> it then to some live distribution for example to 100% prevent
>>> automatic assembly. I think speed should be couple of times higher.
>>
>> Yes.
>> If you stop the array, then copy the backup file, then re-assemble the
>> array giving it the backup file in the new location, all should be well.
>> A reboot while the array is stopped is not a problem.
>
> Should or will? :) I have 0.90, now 0.91, metadata, is everything
> needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
> squeeze-backports work well? Or should I compile mdadm 3.2.4?
>
> In case there is some risk involved I will need to choose between
> waiting and risking power outage happening sometimes in the following
> week (we have something like storm season here) and risking this...
>
> Do you recommend some live linux distro installable on USB which is
> good for this? (One that has newest versions and dont try assemble
> arrays.)
>
> Or will automatic assemble fail and it will cause no problem at all
> for sure? (According to md or mdadm doc this should be the case.) In
> that case can I use distribution on the server, Debian stable plus
> some packages from squeeze, for that? Possibly with added
> raid=noautodetect? I have LVM on top of raid arrays and I dont want to
> cause mess. OS is not on LVM or raid.
>
>>>
>>> - Is it safe to fail and remove problematic drive? The array will be
>>> down to 6 from 8 drives in part where it is not reshaped. It should
>>> double the speed.
>>
>> As safe as it ever is to fail a device in a non-degraded array.
>> i.e. it would not cause a problem directly but of course if you get an error
>> on another device, that would be awkward.
>
> I actually "check"-ed this raid array couple of times few days ago and
> data on other drives were OK. Problematic drive reported couple of
> reading errors, always corrected with data from other drives and by
> rewriting.
>
> About that, shoud this reshaping work OK if it encounter possible
> reading errors on problematic drive? Will it use data from other
> drives to correct that also in this reshaping mode?
>
> Thanks.
>
> Patrik
>
>>>
>>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>>> server in which I need replace the drive.
>>
>> I'm not 100% sure - what version of mdadm are you using?
>> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
>> I'll add test for this to the test suit to make sure it doesn't break again.
>> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>>
>> Using --layout=left-symmetric-6 should work, though testing on some
>> /dev/loop devices first is always a good idea.
>>
>> NeilBrown
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html