Re: How to recover after md crash during reshape?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andras,

{ Added linux-raid back -- convention on kernel.org is to reply-to-all,
trim replies, and either interleave or bottom post.  I'm trimming less
than normal this time so the list can see. }

On 10/20/2015 10:48 AM, andras@xxxxxxxxxxxxxxxx wrote:
> On 2015-10-20 08:49, Phil Turmel wrote:

>> Please supply all of you mdadm -E reports for the seven partitions and
>> the lsdrv output I requests.  Just post the text inline in your reply.
>>
>> Do *not* do anything else.
>>
>> Phil

> Thanks for all the help!
> 
> Here's the output of lsdrv:
> 
> PCI [pata_marvell] 04:00.1 IDE interface: Marvell Technology Group Ltd.
> 88SE9128 IDE Controller (rev 11)
> ├scsi 0:x:x:x [Empty]
> └scsi 2:x:x:x [Empty]
> PCI [pata_jmicron] 05:00.1 IDE interface: JMicron Technology Corp.
> JMB363 SATA/IDE Controller (rev 02)
> ├scsi 1:x:x:x [Empty]
> └scsi 3:x:x:x [Empty]
> PCI [ahci] 04:00.0 SATA controller: Marvell Technology Group Ltd.
> 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
> ├scsi 4:0:0:0 ATA      ST2000DM001-1ER1 {Z4Z1JDN8}
> │└sda 1.82t [8:0] Partitioned (dos)
> │ └sda1 1.82t [8:1] Empty/Unknown
> └scsi 5:0:0:0 ATA      ST2000DM001-1ER1 {Z4Z1H84Q}
>  └sdb 1.82t [8:16] Partitioned (dos)
>   └sdb1 1.82t [8:17] ext4 'data' {d1403616-a9c6-4cd9-8d92-1aabc81fe373}
> PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10
> Family) 4 port SATA IDE Controller #1
> ├scsi 6:0:0:0 ATA      ST31500541AS     {6XW0BQL0}
> │└sdc 1.36t [8:32] Partitioned (dos)
> │ └sdc1 1.36t [8:33] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> ├scsi 6:0:1:0 ATA      WDC WD20EARS-00M {WD-WMAZA0348342}
> │└sdd 1.82t [8:48] Partitioned (dos)
> │ ├sdd1 525.53m [8:49] ext4 'boot1' {a3a1cedc-3866-4d80-af18-a7a4db99d880}
> │ ├sdd2 1.36t [8:50] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> │ └sdd3 465.24g [8:51] MD raid1 (3) inactive
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
> ├scsi 7:0:0:0 ATA      ST31500541AS     {5XW05FFV}
> │└sde 1.36t [8:64] Partitioned (dos)
> │ └sde1 1.36t [8:65] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> └scsi 7:0:1:0 ATA      WDC WD20EARS-00M {WD-WMAZA0209553}
>  └sdf 1.82t [8:80] Partitioned (dos)
>   ├sdf1 525.53m [8:81] ext4 'boot2' {9b0e1e49-c736-47c0-89a1-4cac07c1d5ef}
>   ├sdf2 1.36t [8:82] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
>   └sdf3 465.24g [8:83] MD raid1 (1/3) (w/ sdi3) in_sync
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
>    └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
>     │                 ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
>     └Mounted as /dev/disk/by-uuid/ceb15bfe-e082-484c-9015-1fcc8889b798 @ /
> PCI [ata_piix] 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10
> Family) 2 port SATA IDE Controller #2
> ├scsi 8:0:0:0 ATA      ST31500341AS     {9VS1EFFD}
> │└sdg 1.36t [8:96] Partitioned (dos)
> │ └sdg1 1.36t [8:97] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> └scsi 10:0:0:0 ATA      Hitachi HDS5C302 {ML2220F30TEBLE}
>  └sdh 1.82t [8:112] Partitioned (dos)
>   └sdh1 1.82t [8:113] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> PCI [ahci] 05:00.0 SATA controller: JMicron Technology Corp. JMB363
> SATA/IDE Controller (rev 02)
> ├scsi 9:0:0:0 ATA      WDC WD2002FAEX-0 {WD-WMAY01975001}
> │└sdi 1.82t [8:128] Partitioned (dos)
> │ ├sdi1 525.53m [8:129] Empty/Unknown
> │ ├sdi2 1.36t [8:130] MD raid6 (10) inactive
> {5e57a17d-43eb-0786-42ea-8b6c723593c7}
> │ └sdi3 465.24g [8:131] MD raid1 (2/3) (w/ sdf3) in_sync
> {f89cbbf7-66e9-eb44-42ea-8b6c723593c7}
> │  └md0 465.24g [9:0] MD v0.90 raid1 (3) clean DEGRADED
> {f89cbbf7:66e9eb44:42ea8b6c:723593c7}
> │                     ext4 'root' {ceb15bfe-e082-484c-9015-1fcc8889b798}
> └scsi 11:0:0:0 ATA      ST2000DM001-1ER1 {Z4Z1JCDE}
>  └sdj 1.82t [8:144] Partitioned (dos)
>   └sdj1 1.82t [8:145] Empty/Unknown
> Other Block Devices
> ├loop0 0.00k [7:0] Empty/Unknown
> ├loop1 0.00k [7:1] Empty/Unknown
> ├loop2 0.00k [7:2] Empty/Unknown
> ├loop3 0.00k [7:3] Empty/Unknown
> ├loop4 0.00k [7:4] Empty/Unknown
> ├loop5 0.00k [7:5] Empty/Unknown
> ├loop6 0.00k [7:6] Empty/Unknown
> └loop7 0.00k [7:7] Empty/Unknown
> 
> 
> mdadm output:
> 
> mdadm -E /dev/sdb1 /dev/sda1 /dev/sdc1 /dev/sdd2 /dev/sde1 /dev/sdh1
> /dev/sdg1 /dev/sdi2 /dev/sdj1 /dev/sdf2

> mdadm: No md superblock detected on /dev/sdb1.

> mdadm: No md superblock detected on /dev/sda1.

> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad60723 - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8        1        4      active sync   /dev/sda1
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> /dev/sdd2:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad6072e - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       18        1      active sync
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> /dev/sde1:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad60741 - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3       8       33        3      active sync   /dev/sdc1
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> /dev/sdh1:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad60775 - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     5       8       81        5      active sync   /dev/sdf1
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad6075f - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> /dev/sdi2:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad60788 - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     6       8       98        6      active sync
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> mdadm: No md superblock detected on /dev/sdj1.

> /dev/sdf2:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 5e57a17d:43eb0786:42ea8b6c:723593c7
>   Creation Time : Sat Oct  2 07:21:53 2010
>      Raid Level : raid6
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 11721087488 (11178.10 GiB 12002.39 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 1
> 
>   Reshape pos'n : 4096
>   Delta Devices : 3 (7->10)
> 
>     Update Time : Sat Oct 17 18:59:50 2015
>           State : active
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : fad6074c - correct
>          Events : 2579239
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8       50        0      active sync   /dev/sdd2
> 
>    0     0       8       50        0      active sync   /dev/sdd2
>    1     1       8       18        1      active sync
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       33        3      active sync   /dev/sdc1
>    4     4       8        1        4      active sync   /dev/sda1
>    5     5       8       81        5      active sync   /dev/sdf1
>    6     6       8       98        6      active sync
>    7     7       8      145        7      active sync   /dev/sdj1
>    8     8       8      129        8      active sync   /dev/sdi1
>    9     9       8      113        9      active sync   /dev/sdh1

> Apparently my problems don't stop adding up: now SDD started developing
> problems, so my root partition (md0) is now degraded. I will attempt to
> dd out whatever I can from that drive and continue...

Don't.  You have another problem: green & desktop drives in a raid
array.  They aren't built for it and will give you grief of one form or
another.  Anyways, their problem with timeout mismatch can be worked
around with long driver timeouts.  Before you do anything else, you
*MUST* run this command:

for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

(Arrange for this to happen on every boot, and keep doing it manually
until your boot scripts are fixed.)

Then you can add your missing mirror and let MD fix it:

mdadm /dev/md0 --add /dev/sdd3

After that's done syncing, you can have MD fix any remaining UREs in
that raid1 with:

echo check >/sys/block/md0/md/sync_action

While that's in progress, take the time to read through the links in the
postscript -- the timeout mismatch problem and its impact on
unrecoverable read errors has been hashed out on this list many times.

Now to your big array.  It is vital that it also be cleaned of UREs
after re-creation before you do anything else.  Which means it must
*not* be created degraded (the redundancy is needed to fix UREs).

According to lsdrv and your "mdadm -E" reports, the creation order you
need is:

raid device 0 /dev/sdf2 {WD-WMAZA0209553}
raid device 1 /dev/sdd2 {WD-WMAZA0348342}
raid device 2 /dev/sdg1 {9VS1EFFD}
raid device 3 /dev/sde1 {5XW05FFV}
raid device 4 /dev/sdc1 {6XW0BQL0}
raid device 5 /dev/sdh1 {ML2220F30TEBLE}
raid device 6 /dev/sdi2 {WD-WMAY01975001}

Chunk size is 64k.

Make sure your partially assembled array is stopped:

mdadm --stop /dev/md1

Re-create your array as follows:

mdadm --create --assume-clean --verbose \
    --metadata=1.0 --raid-devices=7 --chunk=64 --level=6 \
    /dev/md1 /dev/sd{f2,d2,g1,e1,c1,h1,i2}

Use "fsck -n" to check your array's filesystem (expect some damage at
the very begining).  If it look reasonable, use fsck to fix any damage.

Then clean up any lingering UREs:

echo check > /sys/block/md1/md/sync_action

Now you can mount it and catch any critical backups. (You do know that
raid != backup, I hope.)

Your array now has a new UUID, so you probably want to fix your
mdadm.conf file and your initramfs.

Finaly, go back and do your --grow, with the --backup-file.

In the future, buy drives with raid ratings like the WD Red family, and
make sure you have a cron job that regularly kicks off array scrubs.  I
do mine weekly.

HTH,

Phil

[1] http://marc.info/?l=linux-raid&m=139050322510249&w=2
[2] http://marc.info/?l=linux-raid&m=135863964624202&w=2
[3] http://marc.info/?l=linux-raid&m=135811522817345&w=1
[4] http://marc.info/?l=linux-raid&m=133761065622164&w=2
[5] http://marc.info/?l=linux-raid&m=132477199207506
[6] http://marc.info/?l=linux-raid&m=133665797115876&w=2
[7] https://www.marc.info/?l=linux-raid&m=142487508806844&w=3

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux