Hi, I have a problem with hanging mdadm reshape task at 100% CPU load (kernel thread "md2_raid5"). Any operation on the raid (i.e. mdadm -S) is also hanging. Rebooting worked, but after triggering the reshape (mdadm --readwrite /dev/md2) I get the same behaviour. dmesg has this stacktrace: [ 1813.500745] INFO: task md2_resync:3377 blocked for more than 120 seconds. [ 1813.500778] Not tainted 4.8.0-2-amd64 #1 [ 1813.500795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1813.500822] md2_resync D ffff93207bc98180 0 3377 2 0x00000000 [ 1813.500827] ffff93206f46d000 ffff93207642a1c0 0000000000000246 ffff932059607bf0 [ 1813.500829] ffff932059608000 ffff93206effc400 ffff93206effc688 ffff932059607d24 [ 1813.500830] ffff932059607bf0 ffff93206e0a3000 ffffffffbc7eb6d1 ffff93206e0a3000 [ 1813.500832] Call Trace: [ 1813.500841] [<ffffffffbc7eb6d1>] ? schedule+0x31/0x80 [ 1813.500847] [<ffffffffc0356924>] ? reshape_request+0x7b4/0x910 [raid456] [ 1813.500851] [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60 [ 1813.500854] [<ffffffffc0356da3>] ? raid5_sync_request+0x323/0x3a0 [raid456] [ 1813.500862] [<ffffffffc0271b50>] ? is_mddev_idle+0x98/0xf3 [md_mod] [ 1813.500866] [<ffffffffc02649a9>] ? md_do_sync+0x959/0xed0 [md_mod] [ 1813.500868] [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60 [ 1813.500872] [<ffffffffc0261363>] ? md_thread+0x133/0x140 [md_mod] [ 1813.500873] [<ffffffffbc7eb1c9>] ? __schedule+0x289/0x760 [ 1813.500877] [<ffffffffc0261230>] ? find_pers+0x70/0x70 [md_mod] [ 1813.500879] [<ffffffffbc29aecd>] ? kthread+0xcd/0xf0 [ 1813.500881] [<ffffffffbc7efcaf>] ? ret_from_fork+0x1f/0x40 [ 1813.500883] [<ffffffffbc29ae00>] ? kthread_create_on_node+0x190/0x190 Is this a known bug / some patch available? [0] http://serverfault.com/questions/773244/mdadm-stuck-reshape-operation [1] http://serverfault.com/questions/697193/raid-5-reshape-freeze -- Sebastian FWIW here are some infos about the raid: mars# uname -a Linux mars 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux sre@mars ~ % cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] md2 : active raid5 sdl[4] sdk[6] sdm[3] sdj[5] 7813774720 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] [>....................] reshape = 0.2% (10197888/3906887360) finish=157305.2min speed=412K/sec ... mars# for disk in /dev/sd[jmlk]; smartctl -i $disk | grep "Device Model" Device Model: WDC WD40EFRX-68WT0N0 Device Model: WDC WD40EFRX-68WT0N0 Device Model: WDC WD40EFRX-68WT0N0 Device Model: WDC WD40EFRX-68WT0N0 mars# for disk in /dev/sd[jmlk]; if smartctl -l scterc,70,70 $disk > /dev/null ; then echo "$disk is good"; fi /dev/sdj is good /dev/sdk is good /dev/sdl is good /dev/sdm is good mars# mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Wed Jan 15 12:59:19 2014 Raid Level : raid5 Array Size : 7813774720 (7451.80 GiB 8001.31 GB) Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed Jan 4 06:09:53 2017 State : clean, reshaping Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Reshape Status : 0% complete Delta Devices : 1, (3->4) Name : mars:2 (local to host mars) UUID : a9946f57:e2c50b35:d192467d:fa495817 Events : 39050 Number Major Minor RaidDevice State 4 8 176 0 active sync /dev/sdl 5 8 144 1 active sync /dev/sdj 3 8 192 2 active sync /dev/sdm 6 8 160 3 active sync /dev/sdk mars# mdadm --examine /dev/sd[jmlk] /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0xc Array UUID : a9946f57:e2c50b35:d192467d:fa495817 Name : mars:2 (local to host mars) Creation Time : Wed Jan 15 12:59:19 2014 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB) Array Size : 11720662080 (11177.69 GiB 12001.96 GB) Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB) Data Offset : 261888 sectors Super Offset : 8 sectors Unused Space : before=261800 sectors, after=560 sectors State : active Device UUID : 94bb69dc:955c3040:5cc4ecbb:28130785 Reshape pos'n : 29073984 (27.73 GiB 29.77 GB) Delta Devices : 1 (3->4) Update Time : Wed Jan 4 06:09:53 2017 Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. Checksum : ed4b675b - correct Events : 39050 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdk: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : a9946f57:e2c50b35:d192467d:fa495817 Name : mars:2 (local to host mars) Creation Time : Wed Jan 15 12:59:19 2014 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB) Array Size : 11720662080 (11177.69 GiB 12001.96 GB) Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB) Data Offset : 261888 sectors Super Offset : 8 sectors Unused Space : before=261800 sectors, after=560 sectors State : active Device UUID : 213a97fc:46865f42:0e8b06be:7309eac1 Reshape pos'n : 29073984 (27.73 GiB 29.77 GB) Delta Devices : 1 (3->4) Update Time : Wed Jan 4 06:09:53 2017 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 4ea4cc90 - correct Events : 39050 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 3 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdl: Magic : a92b4efc Version : 1.2 Feature Map : 0xc Array UUID : a9946f57:e2c50b35:d192467d:fa495817 Name : mars:2 (local to host mars) Creation Time : Wed Jan 15 12:59:19 2014 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB) Array Size : 11720662080 (11177.69 GiB 12001.96 GB) Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB) Data Offset : 261888 sectors Super Offset : 8 sectors Unused Space : before=261800 sectors, after=560 sectors State : active Device UUID : a25cccf5:d23f2274:30dc0ffa:d8a79f14 Reshape pos'n : 29073984 (27.73 GiB 29.77 GB) Delta Devices : 1 (3->4) Update Time : Wed Jan 4 06:09:53 2017 Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. Checksum : 85b982a - correct Events : 39050 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 0 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdm: Magic : a92b4efc Version : 1.2 Feature Map : 0xc Array UUID : a9946f57:e2c50b35:d192467d:fa495817 Name : mars:2 (local to host mars) Creation Time : Wed Jan 15 12:59:19 2014 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB) Array Size : 11720662080 (11177.69 GiB 12001.96 GB) Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB) Data Offset : 261888 sectors Super Offset : 8 sectors Unused Space : before=261800 sectors, after=560 sectors State : active Device UUID : 21517788:cdc63605:5bdc0bee:d23cc234 Reshape pos'n : 29073984 (27.73 GiB 29.77 GB) Delta Devices : 1 (3->4) Update Time : Wed Jan 4 06:09:53 2017 Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. Checksum : 4039a8c7 - correct Events : 39050 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 2 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
Attachment:
signature.asc
Description: PGP signature