Re: What the heck happened to my array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/04/11 00:49, Roberto Spadim wrote:
i don´t know but this happened with me on a hp server, with linux
2,6,37 i changed kernel to a older release and the problem ended,
check with neil and others md guys what´s the real problem
maybe realtime module and others changes inside kernel are the
problem, maybe not...
just a quick solution idea: try a older kernel


Quick precis:
- Started reshape 512k to 64k chunk size.
- sdd got bad sector and was kicked.
- Array froze all IO.
- Reboot required to get system back.
- Restarted reshape with 9 drives.
- sdl suffered IO error and was kicked
- Array froze all IO.
- Reboot required to get system back.
- Array will no longer mount with 8/10 drives.
- Mdadm 3.1.5 segfaults when trying to start reshape.
Naively tried to run it under gdb to get a backtrace but was unable to stop it forking
- Got array started with mdadm 3.2.1
- Attempted to re-add sdd/sdl (now marked as spares)

root@srv:~/mdadm-3.1.5# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdl[1](S) sdd[6](S) sdc[0] sdh[9] sda[8] sde[7] sdg[5] sdb[4] sdf[3] sdm[2] 7814078464 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/8] [U_UUUU_UUU]
      	resync=DELAYED

md2 : active raid5 sdi[0] sdk[3] sdj[1]
1465146368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md6 : active raid1 sdo6[0] sdn6[1]
      821539904 blocks [2/2] [UU]

md5 : active raid1 sdo5[0] sdn5[1]
      104864192 blocks [2/2] [UU]

md4 : active raid1 sdo3[0] sdn3[1]
      20980800 blocks [2/2] [UU]

md3 : active (auto-read-only) raid1 sdo2[0] sdn2[1]
      8393856 blocks [2/2] [UU]

md1 : active raid1 sdo1[0] sdn1[1]
      20980736 blocks [2/2] [UU]

unused devices: <none>


[  303.640776] md: bind<sdl>
[  303.677461] md: bind<sdm>
[  303.837358] md: bind<sdf>
[  303.846291] md: bind<sdb>
[  303.851476] md: bind<sdg>
[  303.860725] md: bind<sdd>
[  303.861055] md: bind<sde>
[  303.861982] md: bind<sda>
[  303.862830] md: bind<sdh>
[  303.863128] md: bind<sdc>
[  303.863306] md: kicking non-fresh sdd from array!
[  303.863353] md: unbind<sdd>
[  303.900207] md: export_rdev(sdd)
[  303.900260] md: kicking non-fresh sdl from array!
[  303.900306] md: unbind<sdl>
[  303.940100] md: export_rdev(sdl)
[  303.942181] md/raid:md0: reshape will continue
[  303.942242] md/raid:md0: device sdc operational as raid disk 0
[  303.942285] md/raid:md0: device sdh operational as raid disk 9
[  303.942327] md/raid:md0: device sda operational as raid disk 8
[  303.942368] md/raid:md0: device sde operational as raid disk 7
[  303.942409] md/raid:md0: device sdg operational as raid disk 5
[  303.942449] md/raid:md0: device sdb operational as raid disk 4
[  303.942490] md/raid:md0: device sdf operational as raid disk 3
[  303.942531] md/raid:md0: device sdm operational as raid disk 2
[  303.943733] md/raid:md0: allocated 10572kB
[ 303.943866] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[  303.943912] RAID conf printout:
[  303.943916]  --- level:6 rd:10 wd:8
[  303.943920]  disk 0, o:1, dev:sdc
[  303.943924]  disk 2, o:1, dev:sdm
[  303.943927]  disk 3, o:1, dev:sdf
[  303.943931]  disk 4, o:1, dev:sdb
[  303.943934]  disk 5, o:1, dev:sdg
[  303.943938]  disk 7, o:1, dev:sde
[  303.943941]  disk 8, o:1, dev:sda
[  303.943945]  disk 9, o:1, dev:sdh
[  303.944061] md0: detected capacity change from 0 to 8001616347136
[  303.944366] md: md0 switched to read-write mode.
[  303.944427] md: reshape of RAID array md0
[  303.944469] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 303.944511] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[  303.944573] md: using 128k window, over a total of 976759808 blocks.
[  304.054875]  md0: unknown partition table
[ 304.393245] mdadm[5940]: segfault at 7f2000 ip 00000000004480d2 sp 00007fffa04777b8 error 4 in mdadm[400000+64000]


root@srv:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
     Array Size : 7814078464 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
   Raid Devices : 10
  Total Devices : 10
    Persistence : Superblock is persistent

    Update Time : Tue Apr  5 07:54:30 2011
          State : active, degraded
 Active Devices : 8
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 2

         Layout : left-symmetric
     Chunk Size : 512K

  New Chunksize : 64K

           Name : srv:server  (local to host srv)
           UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
         Events : 633835

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       0        0        1      removed
       2       8      192        2      active sync   /dev/sdm
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       8       96        5      active sync   /dev/sdg
       6       0        0        6      removed
       7       8       64        7      active sync   /dev/sde
       8       8        0        8      active sync   /dev/sda
       9       8      112        9      active sync   /dev/sdh

       1       8      176        -      spare   /dev/sdl
       6       8       48        -      spare   /dev/sdd

root@srv:~# for i in /dev/sd? ; do mdadm --examine $i ; done
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 9beb9a0f:2a73328c:f0c17909:89da70fd

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : c58ed095 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 8
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 75d997f8:d9372d90:c068755b:81c8206b

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : 72321703 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 5738a232:85f23a16:0c7a9454:d770199c

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : 5c61ea2e - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 83a2c731:ba2846d0:2ce97d83:de624339

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : e1a5ebbc - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f1e3c1d3:ea9dc52e:a4e6b70e:e25a0321

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : 551997d7 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : c32dff71:0b8c165c:9f589b0f:bcbc82da

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : db0aa39b - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdg:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 194bc75c:97d3f507:4915b73a:51a50172

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : 344cadbe - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdh:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 1326457e:4fc0a6be:0073ccae:398d5c7f

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : 8debbb14 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 9
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdi:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e39d73c3:75be3b52:44d195da:b240c146
           Name : srv:2  (local to host srv)
  Creation Time : Sat Jul 10 21:14:29 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
     Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b577b308:56f2e4c9:c78175f4:cf10c77f

    Update Time : Tue Apr  5 07:46:18 2011
       Checksum : 57ee683f - correct
         Events : 455775

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing)
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e39d73c3:75be3b52:44d195da:b240c146
           Name : srv:2  (local to host srv)
  Creation Time : Sat Jul 10 21:14:29 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
     Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b127f002:a4aa8800:735ef8d7:6018564e

    Update Time : Tue Apr  5 07:46:18 2011
       Checksum : 3ae0b4c6 - correct
         Events : 455775

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing)
/dev/sdk:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e39d73c3:75be3b52:44d195da:b240c146
           Name : srv:2  (local to host srv)
  Creation Time : Sat Jul 10 21:14:29 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
     Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 90fddf63:03d5dba4:3fcdc476:9ce3c44c

    Update Time : Tue Apr  5 07:46:18 2011
       Checksum : dd5eef0e - correct
         Events : 455775

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 769940af:66733069:37cea27d:7fb28a23

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : dc756202 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
           Name : srv:server  (local to host srv)
  Creation Time : Sat Jan  8 11:25:17 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
     Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
  Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 7e564e2c:7f21125b:c3b1907a:b640178f

  Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
  New Chunksize : 64K

    Update Time : Tue Apr  5 07:54:30 2011
       Checksum : b3df3ee7 - correct
         Events : 633835

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.AAAA.AAA ('A' == active, '.' == missing)

root@srv:~/mdadm-3.1.5# ./mdadm --version
mdadm - v3.1.5 - 23rd March 2011

root@srv:~/mdadm-3.1.5# uname -a
Linux srv 2.6.38 #19 SMP Wed Mar 23 09:57:05 WST 2011 x86_64 GNU/Linux

Now. The array restarted with mdadm 3.2.1, but of course its now reshaping 8 out of 10 disks, has no redundancy and is going at 600k/s which will take over 10 days. Is there anything I can do to give it some redundancy while it completes or am I better to copy the data off, blow it away and start again? All the important stuff is backed up anyway, I just wanted to avoid restoring 8TB from backup if I could.

Regards,
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux