Re: mdadm - stuck reshape operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>>> "Peter" == Peter Bates <peter.thebates@xxxxxxxxx> writes:


Peter> I have a 3 disk RAID 5 array that I tried to add a 4th disk to.

>> mdadm --add /dev/md6 /dev/sdb1
>> mdadm --grow --raid-devices=4 /dev/md6

Peter> This operation started successfully and proceeded until it hit 51.1%

>> cat /proc/mdstat
Peter> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
Peter> [raid4] [multipath] [faulty]
Peter> md6 : active raid5 sda1[0] sdb1[5] sdf1[3] sde1[4]
Peter>       3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
Peter>       [==========>..........]  reshape = 51.1% (998533632/1953382400)
Peter> finish=9046506.1min speed=1K/sec
Peter>       bitmap: 0/15 pages [0KB], 65536KB chunk

Peter> It has been sitting on the same 998533632 position for
Peter> days. I've tried a few reboots, but it never progresses.
Peter> Stopping the array, or trying to start the logical volume in it
Peter> hangs.  Altering the min / max speed parameters has no effect.
Peter> When I reboot and resemble the array the speed indicated
Peter> steadily drops to almost 0.

>> mdadm --assemble /dev/md6 --verbose --uuid 90c2b5c3:3bbfa0d7:a5efaeed:726c43e2

I looked back in my email archives, and I wonder if maybe you have
SElinux enabled?  If so, please turn it off and see if that helps.

What happens when you use dd on each of the drives and dump the output
to /dev/null?

Are there any messages in the logs, or dmesg output after the stuff
you showed?  Can you maybe 'strace' the mdadm process, or even go grab
the latest version using git from:

  git clone git://neil.brown.name/mdadm

And see if compiling it yourself from the master might do the trick.  


Peter> I haven't tried anything more drastic than a reboot yet,
Peter> Below is as much information as I can think to provide at this stage.
Peter> Please let me know what else I can do.
Peter> I'm happy to change kernels, kernel config or anything else require to
Peter> get better info.

Peter> Kernel: 4.4.3
Peter> mdadm 3.4

>> ps aux | grep md6
Peter> root      5041 99.9  0.0      0     0 ?        R    07:10 761:58 [md6_raid5]
Peter> root      5042  0.0  0.0      0     0 ?        D    07:10   0:00 [md6_reshape]

Peter> This is consistent. 100% cpu on the raid component, but not the reshape

>> mdadm --detail --verbose /dev/md6
Peter> /dev/md6:
Peter>         Version : 1.2
Peter>   Creation Time : Fri Aug 29 21:13:52 2014
Peter>      Raid Level : raid5
Peter>      Array Size : 3906764800 (3725.78 GiB 4000.53 GB)
Peter>   Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
Peter>    Raid Devices : 4
Peter>   Total Devices : 4
Peter>     Persistence : Superblock is persistent

Peter>   Intent Bitmap : Internal

Peter>     Update Time : Wed Apr 27 07:10:07 2016
Peter>           State : clean, reshaping
Peter>  Active Devices : 4
Peter> Working Devices : 4
Peter>  Failed Devices : 0
Peter>   Spare Devices : 0

Peter>          Layout : left-symmetric
Peter>      Chunk Size : 512K

Peter>  Reshape Status : 51% complete
Peter>   Delta Devices : 1, (3->4)

Peter>            Name : Alpheus:6  (local to host Alpheus)
Peter>            UUID : 90c2b5c3:3bbfa0d7:a5efaeed:726c43e2
Peter>          Events : 47975

Peter>     Number   Major   Minor   RaidDevice State
Peter>        0       8        1        0      active sync   /dev/sda1
Peter>        4       8       65        1      active sync   /dev/sde1
Peter>        3       8       81        2      active sync   /dev/sdf1
Peter>        5       8       17        3      active sync   /dev/sdb1

>> iostat
Peter> Linux 4.4.3-gentoo (Alpheus)    04/27/2016      _x86_64_        (4 CPU)

Peter> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
Peter>            1.84    0.00   24.50    0.09    0.00   73.57

Peter> Looking at the individual disks I can see minor activity on the MD6
Peter> members. This activity tends to match up with the overall rate
Peter> reported by /proc/mdstat

Peter> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
Peter> sda               0.02         2.72         1.69     128570      79957
Peter> sdb               0.01         0.03         1.69       1447      79889
Peter> sdd               3.85         2.27        56.08     106928    2646042
Peter> sde               0.02         2.73         1.69     128610      79961
Peter> sdf               0.02         2.72         1.69     128128      79961
Peter> sdc               4.08         5.44        56.08     256899    2646042
Peter> md0               2.91         7.62        55.08     359714    2598725
Peter> dm-0              0.00         0.03         0.00       1212          0
Peter> dm-1              0.00         0.05         0.00       2151          9
Peter> dm-2              2.65         6.52         3.42     307646     161296
Peter> dm-3              0.19         1.03        51.66      48377    2437420
Peter> md6               0.00         0.02         0.00       1036          0

>> dmesg
Peter> [ 1199.426995] md: bind<sde1>
Peter> [ 1199.427779] md: bind<sdf1>
Peter> [ 1199.428379] md: bind<sdb1>
Peter> [ 1199.428592] md: bind<sda1>
Peter> [ 1199.429260] md/raid:md6: reshape will continue
Peter> [ 1199.429274] md/raid:md6: device sda1 operational as raid disk 0
Peter> [ 1199.429275] md/raid:md6: device sdb1 operational as raid disk 3
Peter> [ 1199.429276] md/raid:md6: device sdf1 operational as raid disk 2
Peter> [ 1199.429277] md/raid:md6: device sde1 operational as raid disk 1
Peter> [ 1199.429498] md/raid:md6: allocated 4338kB
Peter> [ 1199.429807] md/raid:md6: raid level 5 active with 4 out of 4
Peter> devices, algorithm 2
Peter> [ 1199.429810] RAID conf printout:
Peter> [ 1199.429811]  --- level:5 rd:4 wd:4
Peter> [ 1199.429812]  disk 0, o:1, dev:sda1
Peter> [ 1199.429814]  disk 1, o:1, dev:sde1
Peter> [ 1199.429816]  disk 2, o:1, dev:sdf1
Peter> [ 1199.429817]  disk 3, o:1, dev:sdb1
Peter> [ 1199.429993] created bitmap (15 pages) for device md6
Peter> [ 1199.430297] md6: bitmap initialized from disk: read 1 pages, set 0
Peter> of 29807 bits
Peter> [ 1199.474604] md6: detected capacity change from 0 to 4000527155200
Peter> [ 1199.474611] md: reshape of RAID array md6
Peter> [ 1199.474613] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Peter> [ 1199.474614] md: using maximum available idle IO bandwidth (but not
Peter> more than 200000 KB/sec) for reshape.
Peter> [ 1199.474617] md: using 128k window, over a total of 1953382400k.

>> lsblk
Peter> NAME                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
Peter> sda                             8:0    0  1.8T  0 disk
Peter> └─sda1                          8:1    0  1.8T  0 part
Peter>   └─md6                         9:6    0  3.7T  0 raid5
Peter> sdb                             8:16   0  1.8T  0 disk
Peter> └─sdb1                          8:17   0  1.8T  0 part
Peter>   └─md6                         9:6    0  3.7T  0 raid5
Peter> sdc                             8:32   0  2.7T  0 disk
Peter> ├─sdc1                          8:33   0   16M  0 part
Peter> └─sdc2                          8:34   0  2.7T  0 part
Peter>   └─md0                         9:0    0  2.7T  0 raid1
Peter>     ├─vg--mirror-swap         253:0    0    4G  0 lvm   [SWAP]
Peter>     ├─vg--mirror-boot         253:1    0  256M  0 lvm   /boot
Peter>     ├─vg--mirror-root         253:2    0  256G  0 lvm   /
Peter>     └─vg--mirror-data--mirror 253:3    0  2.5T  0 lvm   /data/mirror
Peter> sdd                             8:48   0  2.7T  0 disk
Peter> ├─sdd1                          8:49   0   16M  0 part
Peter> └─sdd2                          8:50   0  2.7T  0 part
Peter>   └─md0                         9:0    0  2.7T  0 raid1
Peter>     ├─vg--mirror-swap         253:0    0    4G  0 lvm   [SWAP]
Peter>     ├─vg--mirror-boot         253:1    0  256M  0 lvm   /boot
Peter>     ├─vg--mirror-root         253:2    0  256G  0 lvm   /
Peter>     └─vg--mirror-data--mirror 253:3    0  2.5T  0 lvm   /data/mirror
Peter> sde                             8:64   0  1.8T  0 disk
Peter> └─sde1                          8:65   0  1.8T  0 part
Peter>   └─md6                         9:6    0  3.7T  0 raid5
Peter> sdf                             8:80   0  1.8T  0 disk
Peter> └─sdf1                          8:81   0  1.8T  0 part
Peter>   └─md6                         9:6    0  3.7T  0 raid5

Peter> Thanks for any pointers

Peter> Peter Bates
Peter> peter.thebates@xxxxxxxxx
Peter> --
Peter> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
Peter> the body of a message to majordomo@xxxxxxxxxxxxxxx
Peter> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux