Problems with raid6 growing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good day everybody,

My current setup is raid5 across 3 750G disks. It contains about
1.4T of data (ext3). I'm running Fedora 7.92 (2.6.23-0.214.rc8.git2.fc8,
mdadm v2.6.2). I've ruined myself today to get next 3 750G disks, and
I'm doing some preparatory testing. My aim is to end up with a raid6
across 6 drives, and of course, preserve the data I have. ;-)

After rethinking, and thanks to the feedback from this list, I've obtained a
few months ago, I've come up with the following plan. It seems pretty safe,
as it can survive a single drive failure at any time (I reckon so).

1. create a 3 hdd + 1 missing raid6
2. copy data
3. "check" both arrays
4. degrade old raid5 by 1 drive (--zero-superblock)
5. add it to raid6 & let it sync back
6. "check" raid6 again
7. stop raid5, --zero-superblock on its drives
8. add 2 drives to raid6, --grow it, and then resize ext3

I'm also planning on not doing one huge partition on each disk, but rather
giving whole sd? to md (instead of sd?1 that is).

I've tried dry-running some parts of my plan, and apparently I've encountered
at least 2 problems so far.

First of all, it seems that my version of mdadm doesn't like the idea
of creating
a raid6 with one missing drive. It appears that with mdadm 2.6.7 it's
going good,
but I've still not installed new mdadm system-wide, a bit worried about my
old array created with old mdadm. Anyways, that's within reach, if it's
the aforementioned "raid6 with one missing" problem. It errored with:
Nov  7 18:17:39 kylie kernel: raid5: failed to run raid set md55
Nov  7 18:17:39 kylie kernel: md: pers->run() failed ...
Nov  7 18:17:39 kylie kernel: md: md55 stopped.

Second problem is way more mysterious to me. I cannot grow a raid6!

[root@kylie raid-test]# /sbin/losetup -a
/dev/loop0: [0805]:488562 (dysk1)
/dev/loop1: [0805]:488563 (dysk2)
/dev/loop2: [0805]:488565 (dysk3)
/dev/loop3: [0805]:488566 (dysk4)
/dev/loop4: [0805]:683033 (dysk5)
/dev/loop5: [0805]:683034 (dysk6)

They are about 25MB big.

[root@kylie raid-test]# /sbin/mdadm --create --verbose /dev/md55
--chunk=256 -l 6  --raid-devices=4 /dev/loop0 /dev/loop1 /dev/loop2
/dev/loop3
mdadm: layout defaults to left-symmetric
mdadm: /dev/loop0 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Nov  7 18:24:05 2008
mdadm: /dev/loop1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Nov  7 18:24:05 2008
mdadm: /dev/loop2 appears to be part of a raid array:
    level=raid6 devices=4 ctime=Fri Nov  7 18:23:39 2008
mdadm: /dev/loop3 appears to be part of a raid array:
    level=raid6 devices=4 ctime=Fri Nov  7 18:23:39 2008
mdadm: size set to 24832K
Continue creating array? y
mdadm: array /dev/md55 started.
[root@kylie raid-test]# /sbin/mdadm --add /dev/md55 /dev/loop4 /dev/loop5
mdadm: added /dev/loop4
mdadm: added /dev/loop5
[root@kylie raid-test]# /sbin/mdadm --grow /dev/md55 --raid-devices=6
mdadm: Need to backup 1024K of critical section..

and that's when it hangs. I mean, that mdadm invocation is not returning,
or haven't returned in last 2 hours. It clearly should've.
dmesg seems to say that reshape has been successful

Nov  7 18:26:12 kylie kernel: md: bind<loop4>
Nov  7 18:26:12 kylie kernel: md: bind<loop5>
Nov  7 18:26:35 kylie kernel: RAID5 conf printout:
Nov  7 18:26:35 kylie kernel:  --- rd:6 wd:6
Nov  7 18:26:35 kylie kernel:  disk 0, o:1, dev:loop0
Nov  7 18:26:35 kylie kernel:  disk 1, o:1, dev:loop1
Nov  7 18:26:35 kylie kernel:  disk 2, o:1, dev:loop2
Nov  7 18:26:35 kylie kernel:  disk 3, o:1, dev:loop3
Nov  7 18:26:35 kylie kernel:  disk 4, o:1, dev:loop5
Nov  7 18:26:35 kylie kernel: RAID5 conf printout:
Nov  7 18:26:35 kylie kernel:  --- rd:6 wd:6
Nov  7 18:26:35 kylie kernel:  disk 0, o:1, dev:loop0
Nov  7 18:26:35 kylie kernel:  disk 1, o:1, dev:loop1
Nov  7 18:26:35 kylie kernel:  disk 2, o:1, dev:loop2
Nov  7 18:26:35 kylie kernel:  disk 3, o:1, dev:loop3
Nov  7 18:26:35 kylie kernel:  disk 4, o:1, dev:loop5
Nov  7 18:26:35 kylie kernel:  disk 5, o:1, dev:loop4
Nov  7 18:26:35 kylie kernel: md: reshape of RAID array md55
Nov  7 18:26:35 kylie kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Nov  7 18:26:35 kylie kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reshape.
Nov  7 18:26:35 kylie kernel: md: using 128k window, over a total of
24832 blocks.
Nov  7 18:26:35 kylie kernel: md: md55: reshape done.
Nov  7 18:26:35 kylie kernel: RAID5 conf printout:
Nov  7 18:26:35 kylie kernel:  --- rd:6 wd:6
Nov  7 18:26:35 kylie kernel:  disk 0, o:1, dev:loop0
Nov  7 18:26:35 kylie kernel:  disk 1, o:1, dev:loop1
Nov  7 18:26:35 kylie kernel:  disk 2, o:1, dev:loop2
Nov  7 18:26:35 kylie kernel:  disk 3, o:1, dev:loop3
Nov  7 18:26:35 kylie kernel:  disk 4, o:1, dev:loop5
Nov  7 18:26:35 kylie kernel:  disk 5, o:1, dev:loop4

and of course, I cannot stop the array.

Nov  7 18:33:17 kylie kernel: md: md55 still in use.

[root@kylie kotek]# /sbin/mdadm --detail --verbose /dev/md55
/dev/md55:
        Version : 00.90.03
  Creation Time : Fri Nov  7 18:24:27 2008
     Raid Level : raid6
     Array Size : 99328 (97.02 MiB 101.71 MB)
  Used Dev Size : 24832 (24.25 MiB 25.43 MB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 55
    Persistence : Superblock is persistent

    Update Time : Fri Nov  7 18:38:45 2008
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           UUID : e6ed36fd:117f91d8:0bcc3650:23ed078a
         Events : 0.42

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync   /dev/loop0
       1       7        1        1      active sync   /dev/loop1
       2       7        2        2      active sync   /dev/loop2
       3       7        3        3      active sync   /dev/loop3
       4       7        5        4      active sync   /dev/loop5
       5       7        4        5      active sync   /dev/loop4

[root@kylie kotek]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md55 : active raid6 loop0[0] loop4[5] loop5[4] loop3[3] loop2[2] loop1[1]
      99328 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

md0 : active raid5 sdb1[0] sdd1[2] sdc1[1]
      1465143808 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

I can interrupt the hung 'mdadm --grow' with ^C. After stopping the array,
it also seems that it's fine.

[root@kylie raid-test]# /sbin/mdadm -A --verbose  /dev/md55 /dev/loop[012345]
mdadm: looking for devices for /dev/md55
mdadm: /dev/loop0 is identified as a member of /dev/md55, slot 0.
mdadm: /dev/loop1 is identified as a member of /dev/md55, slot 1.
mdadm: /dev/loop2 is identified as a member of /dev/md55, slot 2.
mdadm: /dev/loop3 is identified as a member of /dev/md55, slot 3.
mdadm: /dev/loop4 is identified as a member of /dev/md55, slot 5.
mdadm: /dev/loop5 is identified as a member of /dev/md55, slot 4.
mdadm: added /dev/loop1 to /dev/md55 as 1
mdadm: added /dev/loop2 to /dev/md55 as 2
mdadm: added /dev/loop3 to /dev/md55 as 3
mdadm: added /dev/loop5 to /dev/md55 as 4
mdadm: added /dev/loop4 to /dev/md55 as 5
mdadm: added /dev/loop0 to /dev/md55 as 0
mdadm: /dev/md55 has been started with 6 drives.

Question here arises, is it dangerous? Or at least, we know what's going
on, why mdadm is not returning. I'm a bit worried about it, to be honest.

Also, somewhere in test, I've had the following (in the same scenario)
[root@kylie raid-test]# /sbin/mdadm --grow /dev/md55 --raid-devices=6
mdadm: Need to backup 1024K of critical section..
mdadm: /dev/md55: failed to suspend device.
[root@kylie raid-test]# /sbin/mdadm --grow /dev/md55 --raid-devices=6
mdadm: Need to backup 1024K of critical section..

(nothing happens, ^C, and arrays is grown again)

Additionally, I've tried this with mdadm 2.6.7, it goes as follows,

[root@kylie raid-test]# ~kotek/mdadm-2.6.7/mdadm --zero-superblock
/dev/loop[012345]
[root@kylie raid-test]# /sbin/mdadm --create --verbose /dev/md55
--chunk=256 -l 6  --raid-devices=4 /dev/loop0 /dev/loop1 /dev/loop2
/dev/loop3
mdadm: layout defaults to left-symmetric
mdadm: size set to 24832K
mdadm: array /dev/md55 started.
[root@kylie raid-test]# ~kotek/mdadm-2.6.7/mdadm --add /dev/md55
/dev/loop4 /dev/loop5
mdadm: added /dev/loop4
mdadm: added /dev/loop5
[root@kylie raid-test]# ~kotek/mdadm-2.6.7/mdadm --grow /dev/md55
--raid-devices=6
mdadm: Need to backup 1024K of critical section..
mdadm: /dev/md55: failed to suspend device.
[root@kylie raid-test]# ~kotek/mdadm-2.6.7/mdadm --grow /dev/md55
--raid-devices=6
mdadm: Need to backup 1024K of critical section..
mdadm: ... critical section passed.
[root@kylie raid-test]#

On top of that, dmesg doesn't have any error for "failed to suspend device"
on any of those invocations. Yes, I'm really worried.

It's better than old mdadm, for sure, but I'm worried about 1st error
message too.
I'll be very happy to hear any comforting opinion, that it's totally harmless.
Pretty please.

My last and final question is really simple for you. Which version of
superblock
should I use. 1.1 seems the most common choice, and I've not seen any
reasons not to use it. I hope I'm right on that.

My contingency plan is to just add those 3 new drives to raid5 (I do hope
grow will work on it), and wait for raid5->6 live reshape.

Your input on any of those topic will be extremely valuable to me,
Have a pleasant evening,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux