Re: Hot-replace for RAID5

Patrik Horník <patrik@xxxxxx> · Mon, 14 May 2012 02:52:01 +0200

Well,

I used raid=noautodetect and the other arrays did start automatically.
I am not sure who started them, maybe  initscripts... But the one
which is reshaping thankfully did not start.

Unfortunately  the speed is not much better. The top speed is up by
cca third to maybe 2.3 MB/s, which seems pretty small and I am unable
to quickly pinpoint the exact reason.Do you have idea what can it be
and how to improve speed?

In addition the performance problem with bad drive periodically kicks
in sooner and thus the average speed is almost the same, around 0.8 to
0.9 MB/s. I am thinking about failing the problematic drive. Except
that I will end up without redundancy for yet not reshaped part,
should the failing work as expected even in the situation array is
now? (raid6 with 8 drives, 7 active devices in not yet reshaped part,
couple of times stopped and start with backup-file.)

Thanks.

Patrik

On Mon, May 14, 2012 at 12:15 AM, NeilBrown <neilb@xxxxxxx> wrote:
> On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>
>> Hi Neil,
>>
>> I decided to move backup file on other device. I stopped the array,
>> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
>> does it exactly mean? I dont want to proceed until I am sure it does
>> not signalize error.
>
> That would appear to be a minor bug in mdadm - I've made a note.
>
> When reshaping an array like this, the 'mdadm' which started the reshape
> forks and continues in the background managing the the  backup file.
> When it exits, having completed, it makes sure that the array is 'unfrozen'
> just to be safe.
> However if it exits because  the array was stopped, there is no array to
> unfreeze an it gets a little confused.
> So it is a bug but it does not affect the data on the devices or indicate
> that anything serious went wrong when stopping the array.
>
>>
>> I quickly checked sources and it seems to be related to some sysfs
>> resources, but I am not sure. But the array disappeared from
>> /sys/block/.
>
> Exactly.  And as the array disappeared, it really has stopped.
>
>
>>
>> Thanks.
>>
>> Patrik
>>
>> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@xxxxxx> wrote:
>> > Hi Neil,
>> >
>> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>> >>
>> >>> Neil,
>> >>
>> >> Hi Patrik,
>> >>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>> >>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>> >>  is a bit later for that :-(
>> >
>> > --layout=preserve is mentioned also in the md or mdadm
>> > documentation... So is it not the right one?
>
> It should be ... I think.  But it definitely seems not to work.  I only have
> a vague memory of how it was meant to work so I'll have to review the code
> and add some proper self-tests.
>
>> >
>> >>>
>> >>> so I further analyzed the behaviour and I found following:
>> >>>
>> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>> >>> of the drives, that drive is utilized almost 80% according to iostat
>> >>> -x and its avg queue length is almost 4 while having await under 50
>> >>> ms.
>> >>>
>> >>> - The variable speed and low speeds down to 100 KB are caused by
>> >>> problems on drive I suspected as problematic. Its service time is
>> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>> >>> tested the read speed on it by running check of array and it worked
>> >>> with 30 MB/s. And because preserve should only read from it I did not
>> >>> specifically test its write speed )
>> >>>
>> >>> So my questions are:
>> >>>
>> >>> - Is there a way I can move backup_file to other drive 100% safely? To
>> >>> add another non-network drive I need to restart the server. I can boot
>> >>> it then to some live distribution for example to 100% prevent
>> >>> automatic assembly. I think speed should be couple of times higher.
>> >>
>> >> Yes.
>> >> If you stop the array, then copy the backup file, then re-assemble the
>> >> array giving it the backup file in the new location, all should be well.
>> >> A reboot while the array is stopped is not a problem.
>> >
>> > Should or will? :) I have 0.90, now 0.91, metadata, is everything
>> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
>> > squeeze-backports work well? Or should I compile mdadm 3.2.4?
>
> "Will" requires clairvoyance :-)
> 0.91 is the same as 0.90, except that the array is in the middle of a reshape.
> This make sure that old kernels which don't know about reshape never try to
> start the array.
> Yes - everything you need is stored in the 0.91 metadata and the backup file.
> After a clean shutdown, you could manage without the backup file if you had
> to, but as you have it, that isn't an issue.
>
>> >
>> > In case there is some risk involved I will need to choose between
>> > waiting and risking power outage happening sometimes in the following
>> > week (we have something like storm season here) and risking this...
>
> There is always risk.
> I think you made a wise choice in choosing the move the backup file.
>
>> >
>> > Do you recommend some live linux distro installable on USB which is
>> > good for this? (One that has newest versions and dont try assemble
>> > arrays.)
>
> No.  Best to use whatever you are familiar with.
>
>
>> >
>> > Or will automatic assemble fail and it will cause no problem at all
>> > for sure? (According to md or mdadm doc this should be the case.) In
>> > that case can I use distribution on the server, Debian stable plus
>> > some packages from squeeze, for that? Possibly with added
>> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to
>> > cause mess. OS is not on LVM or raid.
>> >
>
> raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel
> autodetect will try to start a reshaping raid - I hope not.
>
>> >>>
>> >>> - Is it safe to fail and remove problematic drive? The array will be
>> >>> down to 6 from 8 drives in part where it is not reshaped. It should
>> >>> double the speed.
>> >>
>> >> As safe as it ever is to fail a device in a non-degraded array.
>> >> i.e. it would not cause a problem directly but of course if you get an error
>> >> on another device, that would be awkward.
>> >
>> > I actually "check"-ed this raid array couple of times few days ago and
>> > data on other drives were OK. Problematic drive reported couple of
>> > reading errors, always corrected with data from other drives and by
>> > rewriting.
>
> That is good!
>
>> >
>> > About that, shoud this reshaping work OK if it encounter possible
>> > reading errors on problematic drive? Will it use data from other
>> > drives to correct that also in this reshaping mode?
>
> As long as there are enough working drives to be able to read and write the
> data, the reshape will continue.
>
> NeilBrown
>
>
>> >
>> > Thanks.
>> >
>> > Patrik
>> >
>> >>>
>> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>> >>> server in which I need replace the drive.
>> >>
>> >> I'm not 100% sure - what version of mdadm are you using?
>> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
>> >> I'll add test for this to the test suit to make sure it doesn't break again.
>> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>> >>
>> >> Using --layout=left-symmetric-6 should work, though testing on some
>> >> /dev/loop devices first is always a good idea.
>> >>
>> >> NeilBrown
>> >>
>> >>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html