Re: Hot-replace for RAID5

Patrik Horník <patrik@xxxxxx> · Sat, 12 May 2012 06:40:54 +0200

Neil, the migration to RAID6 is unfortunately not working as expected.

I added spare and used command mdadm --grow /dev/md6 --level 6
--layout=preserve, but I guess it ignored layout preserve.

It asked for backup_file and now it is writing the same amount of data
on all drives. I maybe can live with that, even if that is little
risky because I suspect one of the drives is not OK. But the problem
is I thought backup_file is only for some critical section, so I gave
it backup_file located on one of the drives used in the array. It is
of course not on a partition in the array, but it seems it is the I/O
bottleneck. The speed of reshaping is not constant and varies between
100 K/s and 1.6 MB/s and it seems it will take more than a week maybe
two.

It is kernel 3.2.0 amd64 and mdadm 3.2.2 from squezee backports, it
was seven and now it is eight drives.

What additional info you need to diagnose the problem? I am not yet
100% sure the botlleneck is backup file, but it looks like it from
iostat -d. Is there anything I can do about that? (Like stoping the
reshaping and changing the backup file. To do that I need to restart
server and I need the operation was 100% safe.)

Here is output of detail:

 Version : 0.91
  Creation Time : Tue Aug 18 14:51:41 2009
     Raid Level : raid6
     Array Size : 2933388288 (2797.50 GiB 3003.79 GB)
  Used Dev Size : 488898048 (466.25 GiB 500.63 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 6
    Persistence : Superblock is persistent

    Update Time : Sat May 12 06:37:48 2012
          State : clean, degraded, reshaping
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric-6
     Chunk Size : 64K

 Reshape Status : 0% complete
     New Layout : left-symmetric

           UUID : d8e679a2:5d6fa7a7:2e406ee4:439be8d3
         Events : 0.983549

    Number   Major   Minor   RaidDevice State
       0       8      115        0      active sync   /dev/sdh3
       1       8       67        1      active sync   /dev/sde3
       2       8       99        2      active sync   /dev/sdg3
       3       8       83        3      active sync   /dev/sdf3
       4       8        3        4      active sync   /dev/sda3
       5       8       19        5      active sync   /dev/sdb3
       6       8       35        6      active sync   /dev/sdc3
       7       8       51        7      spare rebuilding   /dev/sdd3

Patrik

On Fri, May 11, 2012 at 9:16 AM, David Brown <david.brown@xxxxxxxxxxxx> wrote:
> Just in case you missed it earlier...
>
> Remember to take a backup before you start this!
>
> Also make notes of things like the "mdadm --detail", version numbers, the
> exact commands executed, etc. (and store this information on another
> computer!)  If something does go wrong, then that information can make it
> much easier for Neil or others to advise you.
>
> mvh.,
>
> David
>
>
>
> On 11/05/2012 04:44, Patrik Horník wrote:
>>
>> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@xxxxxxx>  wrote:
>>>
>>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@xxxxxx>  wrote:
>>>
>>>> Neil, can you please comment if separate operations mentioned in this
>>>> process are behaving and are stable enough as we expect? Thanks.
>>>
>>>
>>> The conversion to and from RAID6 as described should work as expected,
>>> though
>>> it requires having an extra device and requires to 'recovery' cycles.
>>> Specifying the number of --raid-devices is not necessary.  When you
>>> convert
>>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
>>> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is
>>> a
>>> decrease by 1.
>>>
>>> Doing an in-place reshape with the new 3.3 code should work, though with
>>> a
>>> softer "should" than above.  We will only know that it is "stable" when
>>> enough
>>> people (such as yourself) try it and report success.  If anything does go
>>> wrong I would of course help you to put the array back together but I can
>>> never guarantee no data loss.  You wouldn't be the first to test the code
>>> on
>>> live data, but you would be the second that I have heard of.
>>
>>
>> Thanks Neil, this answers my questions. I dont like being second, so
>> RAID5 - RAID6 - RAID5 it is... :)
>>
>> In addition my array has 0.9 metadata so hot-replace would also
>> require conversion of metadata, so all together it seems much riskier.
>>
>>> The in-place reshape is not yet supported by mdadm but it is very easy to
>>> manage directly.  Just
>>>   echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
>>> and as soon as a spare is available the replacement will happen.
>>>
>>> NeilBrown
>>>
>>>
>>>>
>>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@xxxxxxxxxxxx>
>>>>  wrote:
>>>>>
>>>>> (I accidentally sent my first reply directly to the OP, and forgot the
>>>>> mailing list - I'm adding it back now, because I don't want the OP to
>>>>> follow
>>>>> my advice until others have confirmed or corrected it!)
>>>>>
>>>>>
>>>>> On 09/05/2012 21:53, Patrik Horník wrote:
>>>>>>
>>>>>> Great suggestion, thanks.
>>>>>>
>>>>>> So I guess steps with exact parameters should be:
>>>>>> 1, add spare S to RAID5 array
>>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1
>>>>>> --layout=preserve
>>>>>> 3, remove faulty drive and add replacement, let it synchronize
>>>>>> 4, possibly remove added spare S
>>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>>>>>
>>>>>
>>>>>
>>>>> Yes, that's what I was thinking.  You are missing "2b - let it
>>>>> synchronise".
>>>>
>>>>
>>>> Sure :)
>>>>
>>>>> Of course, another possibility is that if you have the space in the
>>>>> system
>>>>> for another drive, you may want to convert to a full raid6 for the
>>>>> future.
>>>>>  That way you have the extra safety built-in in advance. But that will
>>>>> definitely lead to a re-shape.
>>>>
>>>>
>>>> Actually I dont have free physical space, array already has 7 drives.
>>>> For the process I need place the additional drive on table near the PC
>>>> and cool it with fan standing by itself on table... :)
>>>>
>>>>>>
>>>>>> My questions:
>>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>>>>>
>>>>>
>>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>>>>> stuff that I only know about in theory, and have not tried in practice.
>>>>>
>>>>>
>>>>>>
>>>>>> - My array has now left-symmetric layout, so after migration to RAID6
>>>>>> it should be left-symmetric-6. Is RAID6 working without problem in
>>>>>> degraded mode with this layout, no matter which one or two drives are
>>>>>> missing?
>>>>>>
>>>>>
>>>>> The layout will not affect the redundancy or the features of the raid -
>>>>> it
>>>>> will only (slightly) affect the speed of some operations.
>>>>
>>>>
>>>> I know it should work, but it is probably configuration that is not
>>>> used much by users, so maybe it is not tested as much as standard
>>>> layouts. So the question was aiming more at practical experience and
>>>> stability...
>>>>
>>>>>> - What happens in step 5 and how long does it take? (If it is without
>>>>>> reshaping, it should only upgrade superblocks and thats it.)
>>>>>
>>>>>
>>>>> That is my understanding.
>>>>>
>>>>>
>>>>>>
>>>>>> - What happens if I dont remove spare S before migration back to
>>>>>> RAID5? Will the array be reshaped and which drive will it make into
>>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But
>>>>>> if it takes time, it is probably safer.)
>>>>>>
>>>>>
>>>>> I /think/ that the extra disk will turn into a hot spare.  But I am
>>>>> getting
>>>>> out of my depth here - it all depends on how the disks get numbered and
>>>>> how
>>>>> that affects the layout, and I don't know the details here.
>>>>>
>>>>>
>>>>>> So all and alll, what guys do you think is more reliable now, new
>>>>>> hot-replace or these steps?
>>>>>
>>>>>
>>>>>
>>>>> I too am very curious to hear opinions.  Hot-replace will certainly be
>>>>> much
>>>>> simpler and faster than these sorts of re-shaping - it's exactly the
>>>>> sort of
>>>>> situation the feature was designed for.  But I don't know if it is
>>>>> considered stable and well-tested, or "bleeding edge".
>>>>>
>>>>> mvh.,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Patrik
>>>>>>
>>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@xxxxxxxxxxxx>
>>>>>>  wrote:
>>>>>>>
>>>>>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello guys,
>>>>>>>>
>>>>>>>> I need to replace drive in big production RAID5 array and I am
>>>>>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>>>>>
>>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7
>>>>>>>> *
>>>>>>>> 1.5 TB. What do you think about its status / stability /
>>>>>>>> reliability?
>>>>>>>> Do you recommend it on production data?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>
>>>>>>> If you don't want to play with the "bleeding edge" features, you
>>>>>>> could
>>>>>>> add
>>>>>>> the disk and extend the array to RAID6, then remove the old drive. I
>>>>>>> think
>>>>>>> if you want to do it all without doing any re-shapes, however, then
>>>>>>> you'd
>>>>>>> need a third drive (the extra drive could easily be an external USB
>>>>>>> disk
>>>>>>> if
>>>>>>> needed - it will only be used for writing, and not for reading unless
>>>>>>> there's another disk failure).  Start by adding the extra drive as a
>>>>>>> hot
>>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity
>>>>>>> layout.
>>>>>>>  Then
>>>>>>> fail and remove the old drive.  Put the new drive into the box and
>>>>>>> add it
>>>>>>> as
>>>>>>> a hot spare.  It should automatically take its place in the raid5,
>>>>>>> replacing
>>>>>>> the old one.  Once it has been rebuilt, you can fail and remove the
>>>>>>> extra
>>>>>>> drive, then re-shape back to raid5.
>>>>>>>
>>>>>>> If things go horribly wrong, the external drive gives you your parity
>>>>>>> protection.
>>>>>>>
>>>>>>> Of course, don't follow this plan until others here have commented on
>>>>>>> it,
>>>>>>> and either corrected or approved it.
>>>>>>>
>>>>>>> And make sure you have a good backup no matter what you decide to do.
>>>>>>>
>>>>>>> mvh.,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html