Re: raid6 stuck at reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Xiao Ni I found a reason.

I did debug the mdadm process: progress_reshape always returns 1.
Therefore done in child_monitor never becomes 1. And while (!done) { }
of child_monitor become infinity loop. In general, the reason is
backup. need_backup > info->reshape_progress in my case always true.
(gdb) p need_backup
$14 = 212992
(gdb) p info->reshape_progress
$15 = 81920

#mdadm --grow /dev/md3 --raid-devices=12 (without backup-file)
proccess of reshape is proceeding normally now:
[>....................]  reshape =  0.1% (3152844/2925383680)
finish=4832.4min speed=10078K/sec

Obviously, it's bug in user space of mdadm.

2015-11-04 14:36 GMT+07:00 Xiao Ni <xni@xxxxxxxxxx>:
>
> When you run ps auxf | grep md, can you see a progress is stuck?
> If you find it you can check the reason with crash utility.
>
>
> ----- Original Message -----
>> From: "Иван Исаев" <1@xxxxxxxxxx>
>> To: linux-raid@xxxxxxxxxxxxxxx
>> Sent: Wednesday, November 4, 2015 2:44:10 PM
>> Subject: Fwd: raid6 stuck at reshape
>>
>> 1.  cat /sys/block/md3/md/sync_max
>> 8192
>> 2. no selinux
>> 3.
>> after recreate of array:
>> # mdadm --grow --bitmap=none /dev/md3
>> # mdadm --grow /dev/md3 --raid-devices=12 --backup-file=/home/raid/md3.backup
>> mdadm: Need to backup 106496K of critical section..
>> mdadm: Recording backup file in /run/mdadm failed: File exists
>> ...
>> # cat /proc/mdstat
>> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear]
>> md3 : active raid6 sdn[11] sdm[10] sdl[9] sdj[8] sdg[7] sdh[6] sdi[5]
>> sdk[4] sdf[3] sde[2] sdd[1] sdc[0]
>>       26328453120 blocks super 1.2 level 6, 4096k chunk, algorithm 2
>> [12/12] [UUUUUUUUUUUU]
>>       [>....................]  reshape =  0.0% (_4096_/2925383680)
>> finish=3758695.2min speed=12K/sec
>>
>> no changes.
>>
>> 2015-11-04 13:25 GMT+07:00 Xiao Ni <xni@xxxxxxxxxx>:
>> > Hi
>> >
>> > You can check the sync_max whether it's 0.
>> >
>> > [root@storageqe-19 ~]# cd /sys/block/md1/md/
>> > [root@storageqe-19 md]# cat sync_max
>> > 0
>> >
>> > And check selinux:
>> > [root@storageqe-19 ~]# systemctl status mdadm-grow-continue@md1.service
>> > ● mdadm-grow-continue@md1.service - Manage MD Reshape on /dev/md1
>> >    Loaded: loaded (/usr/lib/systemd/system/mdadm-grow-continue@.service;
>> >    static; vendor preset: disabled)
>> >    Active: failed (Result: exit-code) since Tue 2015-11-03 03:39:11 EST;
>> >    21h ago
>> >   Process: 2353 ExecStart=/usr/sbin/mdadm --grow --continue /dev/%I
>> >   (code=exited, status=2)
>> >  Main PID: 2353 (code=exited, status=2)
>> >
>> > Nov 03 03:39:10 storageqe-19.rhts.eng.bos.redhat.com systemd[1]: Started
>> > Manage MD Reshape on /dev/md1.
>> > Nov 03 03:39:10 storageqe-19.rhts.eng.bos.redhat.com systemd[1]: Starting
>> > Manage MD Reshape on /dev/md1...
>> > Nov 03 03:39:11 storageqe-19.rhts.eng.bos.redhat.com systemd[1]:
>> > mdadm-grow-continue@md1.service: main process exite...ENT
>> > Nov 03 03:39:11 storageqe-19.rhts.eng.bos.redhat.com systemd[1]: Unit
>> > mdadm-grow-continue@md1.service entered failed...te.
>> > Nov 03 03:39:11 storageqe-19.rhts.eng.bos.redhat.com systemd[1]:
>> > mdadm-grow-continue@md1.service failed.
>> > Hint: Some lines were ellipsized, use -l to show in full.
>> >
>> > I think this is a selinux-policy problem. And you can try reshape a md
>> > without bitmap.
>> > It can success without bitmap.
>> >
>> > ----- Original Message -----
>> >> From: "Иван Исаев" <1@xxxxxxxxxx>
>> >> To: linux-raid@xxxxxxxxxxxxxxx
>> >> Sent: Wednesday, November 4, 2015 1:53:17 PM
>> >> Subject: raid6 stuck at reshape
>> >>
>> >> 1. init state:
>> >> md3 : active raid6 sdm[10] sdl[9] sdj[8] sdg[7] sdh[6] sdi[5] sdk[4]
>> >> sdf[3] sde[2] sdd[1] sdc[0]
>> >>       26328453120 blocks super 1.2 level 6, 4096k chunk, algorithm 2
>> >> [11/11] [UUUUUUUUUUU]
>> >>       bitmap: 0/22 pages [0KB], 65536KB chunk
>> >>
>> >> 2. mdadm /dev/md3 -a /dev/sdn
>> >> mdadm --grow /dev/md3 --raid-devices=12
>> >> --backup-file=/home/raid/md3.backup
>> >>
>> >> md3 : active raid6 sdn[11] sdm[10] sdl[9] sdj[8] sdg[7] sdh[6] sdi[5]
>> >> sdk[4] sdf[3] sde[2] sdd[1] sdc[0]
>> >>       26328453120 blocks super 1.2 level 6, 4096k chunk, algorithm 2
>> >> [12/12] [UUUUUUUUUUUU]
>> >>       [>....................]  reshape =  0.0% (0/2925383680)
>> >> finish=3047274.6min speed=0K/sec
>> >>       bitmap: 0/22 pages [0KB], 65536KB chunk
>> >>
>> >> # ps aux|grep md3
>> >> root      5232 _54.8_  0.0      0     0 ?        R    10:55  56:43
>> >> [md3_raid6]
>> >> root      6956 _98.4_  0.4  53904 49896 ?        RL   11:01  96:29
>> >> mdadm --grow /dev/md3 --raid-devices=12
>> >> --backup-file=/home/raid/md3.backup
>> >>
>> >> # cat /sys/block/md3/md/reshape_position
>> >> 81920
>> >>
>> >> what can I do about it?
>> >>
>> >> P.S. If I stop the array, it can no longer be assembled:
>> >> # mdadm -S /dev/md3
>> >> # mdadm -A /dev/md3
>> >> mdadm: :/dev/md3 has an active reshape - checking if critical section
>> >> needs to be restored
>> >> mdadm: Failed to restore critical section for reshape, sorry.
>> >>
>> >> mdadm --assemble /dev/md3 -vv --backup-file /home/raid/md3.backup -f
>> >> mdadm: looking for devices for /dev/md3
>> >> ...
>> >> mdadm: /dev/sdn is identified as a member of /dev/md3, slot 11.
>> >> mdadm: /dev/sdl is identified as a member of /dev/md3, slot 9.
>> >> mdadm: /dev/sdg is identified as a member of /dev/md3, slot 7.
>> >> mdadm: /dev/sdm is identified as a member of /dev/md3, slot 10.
>> >> mdadm: /dev/sdj is identified as a member of /dev/md3, slot 8.
>> >> mdadm: /dev/sdk is identified as a member of /dev/md3, slot 4.
>> >> mdadm: /dev/sdf is identified as a member of /dev/md3, slot 3.
>> >> mdadm: /dev/sdd is identified as a member of /dev/md3, slot 1.
>> >> mdadm: /dev/sdi is identified as a member of /dev/md3, slot 5.
>> >> mdadm: /dev/sdh is identified as a member of /dev/md3, slot 6.
>> >> mdadm: /dev/sde is identified as a member of /dev/md3, slot 2.
>> >> mdadm: /dev/sdc is identified as a member of /dev/md3, slot 0.
>> >> mdadm: :/dev/md3 has an active reshape - checking if critical section
>> >> needs to be restored
>> >> mdadm: restoring critical section
>> >> mdadm: Error restoring backup from md3.backup
>> >> mdadm: Failed to restore critical section for reshape, sorry.
>> >>
>> >> # mdadm --assemble /dev/md3 -vv --invalid-backup -f
>> >> ...
>> >> mdadm: :/dev/md3 has an active reshape - checking if critical section
>> >> needs to be restored
>> >> mdadm: No backup metadata on device-11
>> >> mdadm: Failed to find backup of critical section
>> >> mdadm: continuing without restoring backup
>> >> mdadm: added /dev/sdd to /dev/md3 as 1
>> >> mdadm: added /dev/sde to /dev/md3 as 2
>> >> mdadm: added /dev/sdf to /dev/md3 as 3
>> >> mdadm: added /dev/sdk to /dev/md3 as 4
>> >> mdadm: added /dev/sdi to /dev/md3 as 5
>> >> mdadm: added /dev/sdh to /dev/md3 as 6
>> >> mdadm: added /dev/sdg to /dev/md3 as 7
>> >> mdadm: added /dev/sdj to /dev/md3 as 8
>> >> mdadm: added /dev/sdl to /dev/md3 as 9
>> >> mdadm: added /dev/sdm to /dev/md3 as 10
>> >> mdadm: added /dev/sdn to /dev/md3 as 11
>> >> mdadm: added /dev/sdc to /dev/md3 as 0
>> >> mdadm: failed to RUN_ARRAY /dev/md3: Invalid argument
>> >>
>> >> I had to create array again.
>> >> After that the array is operating normally, but I still can't grow it.
>> >>
>> >> P.S.S. kernel: 3.14.56
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux