Re: Hot-replace for RAID5

Patrik Horník <patrik@xxxxxx> · Sat, 12 May 2012 17:56:04 +0200

Neil,

so I further analyzed the behaviour and I found following:

- The bottleneck cca 1.7 MB/s is probably caused by backup file on one
of the drives, that drive is utilized almost 80% according to iostat
-x and its avg queue length is almost 4 while having await under 50
ms.

- The variable speed and low speeds down to 100 KB are caused by
problems on drive I suspected as problematic. Its service time is
sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
tested the read speed on it by running check of array and it worked
with 30 MB/s. And because preserve should only read from it I did not
specifically test its write speed )

So my questions are:

- Is there a way I can move backup_file to other drive 100% safely? To
add another non-network drive I need to restart the server. I can boot
it then to some live distribution for example to 100% prevent
automatic assembly. I think speed should be couple of times higher.

- Is it safe to fail and remove problematic drive? The array will be
down to 6 from 8 drives in part where it is not reshaped. It should
double the speed.

- Why mdadm did ignore layout=preserve? I have other arrays in that
server in which I need replace the drive.

Thanks.

Patrik

On Sat, May 12, 2012 at 6:40 AM, Patrik Horník <patrik@xxxxxx> wrote:
> Neil, the migration to RAID6 is unfortunately not working as expected.
>
> I added spare and used command mdadm --grow /dev/md6 --level 6
> --layout=preserve, but I guess it ignored layout preserve.
>
> It asked for backup_file and now it is writing the same amount of data
> on all drives. I maybe can live with that, even if that is little
> risky because I suspect one of the drives is not OK. But the problem
> is I thought backup_file is only for some critical section, so I gave
> it backup_file located on one of the drives used in the array. It is
> of course not on a partition in the array, but it seems it is the I/O
> bottleneck. The speed of reshaping is not constant and varies between
> 100 K/s and 1.6 MB/s and it seems it will take more than a week maybe
> two.
>
> It is kernel 3.2.0 amd64 and mdadm 3.2.2 from squezee backports, it
> was seven and now it is eight drives.
>
> What additional info you need to diagnose the problem? I am not yet
> 100% sure the botlleneck is backup file, but it looks like it from
> iostat -d. Is there anything I can do about that? (Like stoping the
> reshaping and changing the backup file. To do that I need to restart
> server and I need the operation was 100% safe.)
>
> Here is output of detail:
>
>  Version : 0.91
>  Creation Time : Tue Aug 18 14:51:41 2009
>     Raid Level : raid6
>     Array Size : 2933388288 (2797.50 GiB 3003.79 GB)
>  Used Dev Size : 488898048 (466.25 GiB 500.63 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 6
>    Persistence : Superblock is persistent
>
>    Update Time : Sat May 12 06:37:48 2012
>          State : clean, degraded, reshaping
>  Active Devices : 7
> Working Devices : 8
>  Failed Devices : 0
>  Spare Devices : 1
>
>         Layout : left-symmetric-6
>     Chunk Size : 64K
>
>  Reshape Status : 0% complete
>     New Layout : left-symmetric
>
>           UUID : d8e679a2:5d6fa7a7:2e406ee4:439be8d3
>         Events : 0.983549
>
>    Number   Major   Minor   RaidDevice State
>       0       8      115        0      active sync   /dev/sdh3
>       1       8       67        1      active sync   /dev/sde3
>       2       8       99        2      active sync   /dev/sdg3
>       3       8       83        3      active sync   /dev/sdf3
>       4       8        3        4      active sync   /dev/sda3
>       5       8       19        5      active sync   /dev/sdb3
>       6       8       35        6      active sync   /dev/sdc3
>       7       8       51        7      spare rebuilding   /dev/sdd3
>
>
> Patrik
>
>
> On Fri, May 11, 2012 at 9:16 AM, David Brown <david.brown@xxxxxxxxxxxx> wrote:
>> Just in case you missed it earlier...
>>
>> Remember to take a backup before you start this!
>>
>> Also make notes of things like the "mdadm --detail", version numbers, the
>> exact commands executed, etc. (and store this information on another
>> computer!)  If something does go wrong, then that information can make it
>> much easier for Neil or others to advise you.
>>
>> mvh.,
>>
>> David
>>
>>
>>
>> On 11/05/2012 04:44, Patrik Horník wrote:
>>>
>>> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@xxxxxxx>  wrote:
>>>>
>>>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@xxxxxx>  wrote:
>>>>
>>>>> Neil, can you please comment if separate operations mentioned in this
>>>>> process are behaving and are stable enough as we expect? Thanks.
>>>>
>>>>
>>>> The conversion to and from RAID6 as described should work as expected,
>>>> though
>>>> it requires having an extra device and requires to 'recovery' cycles.
>>>> Specifying the number of --raid-devices is not necessary.  When you
>>>> convert
>>>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
>>>> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is
>>>> a
>>>> decrease by 1.
>>>>
>>>> Doing an in-place reshape with the new 3.3 code should work, though with
>>>> a
>>>> softer "should" than above.  We will only know that it is "stable" when
>>>> enough
>>>> people (such as yourself) try it and report success.  If anything does go
>>>> wrong I would of course help you to put the array back together but I can
>>>> never guarantee no data loss.  You wouldn't be the first to test the code
>>>> on
>>>> live data, but you would be the second that I have heard of.
>>>
>>>
>>> Thanks Neil, this answers my questions. I dont like being second, so
>>> RAID5 - RAID6 - RAID5 it is... :)
>>>
>>> In addition my array has 0.9 metadata so hot-replace would also
>>> require conversion of metadata, so all together it seems much riskier.
>>>
>>>> The in-place reshape is not yet supported by mdadm but it is very easy to
>>>> manage directly.  Just
>>>>   echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
>>>> and as soon as a spare is available the replacement will happen.
>>>>
>>>> NeilBrown
>>>>
>>>>
>>>>>
>>>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@xxxxxxxxxxxx>
>>>>>  wrote:
>>>>>>
>>>>>> (I accidentally sent my first reply directly to the OP, and forgot the
>>>>>> mailing list - I'm adding it back now, because I don't want the OP to
>>>>>> follow
>>>>>> my advice until others have confirmed or corrected it!)
>>>>>>
>>>>>>
>>>>>> On 09/05/2012 21:53, Patrik Horník wrote:
>>>>>>>
>>>>>>> Great suggestion, thanks.
>>>>>>>
>>>>>>> So I guess steps with exact parameters should be:
>>>>>>> 1, add spare S to RAID5 array
>>>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1
>>>>>>> --layout=preserve
>>>>>>> 3, remove faulty drive and add replacement, let it synchronize
>>>>>>> 4, possibly remove added spare S
>>>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes, that's what I was thinking.  You are missing "2b - let it
>>>>>> synchronise".
>>>>>
>>>>>
>>>>> Sure :)
>>>>>
>>>>>> Of course, another possibility is that if you have the space in the
>>>>>> system
>>>>>> for another drive, you may want to convert to a full raid6 for the
>>>>>> future.
>>>>>>  That way you have the extra safety built-in in advance. But that will
>>>>>> definitely lead to a re-shape.
>>>>>
>>>>>
>>>>> Actually I dont have free physical space, array already has 7 drives.
>>>>> For the process I need place the additional drive on table near the PC
>>>>> and cool it with fan standing by itself on table... :)
>>>>>
>>>>>>>
>>>>>>> My questions:
>>>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>>>>>>
>>>>>>
>>>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>>>>>> stuff that I only know about in theory, and have not tried in practice.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - My array has now left-symmetric layout, so after migration to RAID6
>>>>>>> it should be left-symmetric-6. Is RAID6 working without problem in
>>>>>>> degraded mode with this layout, no matter which one or two drives are
>>>>>>> missing?
>>>>>>>
>>>>>>
>>>>>> The layout will not affect the redundancy or the features of the raid -
>>>>>> it
>>>>>> will only (slightly) affect the speed of some operations.
>>>>>
>>>>>
>>>>> I know it should work, but it is probably configuration that is not
>>>>> used much by users, so maybe it is not tested as much as standard
>>>>> layouts. So the question was aiming more at practical experience and
>>>>> stability...
>>>>>
>>>>>>> - What happens in step 5 and how long does it take? (If it is without
>>>>>>> reshaping, it should only upgrade superblocks and thats it.)
>>>>>>
>>>>>>
>>>>>> That is my understanding.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - What happens if I dont remove spare S before migration back to
>>>>>>> RAID5? Will the array be reshaped and which drive will it make into
>>>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But
>>>>>>> if it takes time, it is probably safer.)
>>>>>>>
>>>>>>
>>>>>> I /think/ that the extra disk will turn into a hot spare.  But I am
>>>>>> getting
>>>>>> out of my depth here - it all depends on how the disks get numbered and
>>>>>> how
>>>>>> that affects the layout, and I don't know the details here.
>>>>>>
>>>>>>
>>>>>>> So all and alll, what guys do you think is more reliable now, new
>>>>>>> hot-replace or these steps?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I too am very curious to hear opinions.  Hot-replace will certainly be
>>>>>> much
>>>>>> simpler and faster than these sorts of re-shaping - it's exactly the
>>>>>> sort of
>>>>>> situation the feature was designed for.  But I don't know if it is
>>>>>> considered stable and well-tested, or "bleeding edge".
>>>>>>
>>>>>> mvh.,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Patrik
>>>>>>>
>>>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@xxxxxxxxxxxx>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello guys,
>>>>>>>>>
>>>>>>>>> I need to replace drive in big production RAID5 array and I am
>>>>>>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>>>>>>
>>>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7
>>>>>>>>> *
>>>>>>>>> 1.5 TB. What do you think about its status / stability /
>>>>>>>>> reliability?
>>>>>>>>> Do you recommend it on production data?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you don't want to play with the "bleeding edge" features, you
>>>>>>>> could
>>>>>>>> add
>>>>>>>> the disk and extend the array to RAID6, then remove the old drive. I
>>>>>>>> think
>>>>>>>> if you want to do it all without doing any re-shapes, however, then
>>>>>>>> you'd
>>>>>>>> need a third drive (the extra drive could easily be an external USB
>>>>>>>> disk
>>>>>>>> if
>>>>>>>> needed - it will only be used for writing, and not for reading unless
>>>>>>>> there's another disk failure).  Start by adding the extra drive as a
>>>>>>>> hot
>>>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity
>>>>>>>> layout.
>>>>>>>>  Then
>>>>>>>> fail and remove the old drive.  Put the new drive into the box and
>>>>>>>> add it
>>>>>>>> as
>>>>>>>> a hot spare.  It should automatically take its place in the raid5,
>>>>>>>> replacing
>>>>>>>> the old one.  Once it has been rebuilt, you can fail and remove the
>>>>>>>> extra
>>>>>>>> drive, then re-shape back to raid5.
>>>>>>>>
>>>>>>>> If things go horribly wrong, the external drive gives you your parity
>>>>>>>> protection.
>>>>>>>>
>>>>>>>> Of course, don't follow this plan until others here have commented on
>>>>>>>> it,
>>>>>>>> and either corrected or approved it.
>>>>>>>>
>>>>>>>> And make sure you have a good backup no matter what you decide to do.
>>>>>>>>
>>>>>>>> mvh.,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html