mdadm hangs during raid5 grow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a problem with hanging mdadm reshape task at 100% CPU load
(kernel thread "md2_raid5"). Any operation on the raid (i.e. mdadm
-S) is also hanging.  Rebooting worked, but after triggering the
reshape (mdadm --readwrite /dev/md2) I get the same behaviour.

dmesg has this stacktrace:

[ 1813.500745] INFO: task md2_resync:3377 blocked for more than 120 seconds.
[ 1813.500778]       Not tainted 4.8.0-2-amd64 #1
[ 1813.500795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.500822] md2_resync      D ffff93207bc98180     0  3377      2 0x00000000
[ 1813.500827]  ffff93206f46d000 ffff93207642a1c0 0000000000000246 ffff932059607bf0
[ 1813.500829]  ffff932059608000 ffff93206effc400 ffff93206effc688 ffff932059607d24
[ 1813.500830]  ffff932059607bf0 ffff93206e0a3000 ffffffffbc7eb6d1 ffff93206e0a3000
[ 1813.500832] Call Trace:
[ 1813.500841]  [<ffffffffbc7eb6d1>] ? schedule+0x31/0x80
[ 1813.500847]  [<ffffffffc0356924>] ? reshape_request+0x7b4/0x910 [raid456]
[ 1813.500851]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
[ 1813.500854]  [<ffffffffc0356da3>] ? raid5_sync_request+0x323/0x3a0 [raid456]
[ 1813.500862]  [<ffffffffc0271b50>] ? is_mddev_idle+0x98/0xf3 [md_mod]
[ 1813.500866]  [<ffffffffc02649a9>] ? md_do_sync+0x959/0xed0 [md_mod]
[ 1813.500868]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
[ 1813.500872]  [<ffffffffc0261363>] ? md_thread+0x133/0x140 [md_mod]
[ 1813.500873]  [<ffffffffbc7eb1c9>] ? __schedule+0x289/0x760
[ 1813.500877]  [<ffffffffc0261230>] ? find_pers+0x70/0x70 [md_mod]
[ 1813.500879]  [<ffffffffbc29aecd>] ? kthread+0xcd/0xf0
[ 1813.500881]  [<ffffffffbc7efcaf>] ? ret_from_fork+0x1f/0x40
[ 1813.500883]  [<ffffffffbc29ae00>] ? kthread_create_on_node+0x190/0x190

Is this a known bug / some patch available?

[0] http://serverfault.com/questions/773244/mdadm-stuck-reshape-operation
[1] http://serverfault.com/questions/697193/raid-5-reshape-freeze

-- Sebastian

FWIW here are some infos about the raid:

mars# uname -a
Linux mars 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux
sre@mars ~ % cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 
md2 : active raid5 sdl[4] sdk[6] sdm[3] sdj[5]
      7813774720 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  reshape =  0.2% (10197888/3906887360) finish=157305.2min speed=412K/sec
...
mars# for disk in /dev/sd[jmlk]; smartctl -i $disk | grep "Device Model"
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
mars# for disk in /dev/sd[jmlk]; if smartctl -l scterc,70,70 $disk > /dev/null ; then echo "$disk is good"; fi
/dev/sdj is good
/dev/sdk is good
/dev/sdl is good
/dev/sdm is good
mars# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
     Array Size : 7813774720 (7451.80 GiB 8001.31 GB)
  Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Wed Jan  4 06:09:53 2017
          State : clean, reshaping 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Reshape Status : 0% complete
  Delta Devices : 1, (3->4)

           Name : mars:2  (local to host mars)
           UUID : a9946f57:e2c50b35:d192467d:fa495817
         Events : 39050

    Number   Major   Minor   RaidDevice State
       4       8      176        0      active sync   /dev/sdl
       5       8      144        1      active sync   /dev/sdj
       3       8      192        2      active sync   /dev/sdm
       6       8      160        3      active sync   /dev/sdk
mars# mdadm --examine /dev/sd[jmlk]    
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 94bb69dc:955c3040:5cc4ecbb:28130785

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : ed4b675b - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdk:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 213a97fc:46865f42:0e8b06be:7309eac1

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4ea4cc90 - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : a25cccf5:d23f2274:30dc0ffa:d8a79f14

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : 85b982a - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 21517788:cdc63605:5bdc0bee:d23cc234

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : 4039a8c7 - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux