Re: Recover array after I panicked

Patrik Dahlström <risca@xxxxxxxxxxxxxxx> · Sun, 23 Apr 2017 16:20:22 +0200

On 04/23/2017 04:09 PM, Patrik Dahlström wrote:
> 
> 
> On 04/23/2017 04:06 PM, Brad Campbell wrote:
>> On 23/04/17 17:47, Patrik Dahlström wrote:
>>> Hello,
>>>
>>> Here's the story:
>>>
>>> I started with a 5x6 TB raid5 array. I added another 6 TB drive and
>>> started to grow the array. However, one of my SATA cables were bad and
>>> the reshape gave me lots of I/O errors.
>>>
>>> Instead of fixing the SATA cable issue directly, I shutdown the server
>>> and swapped places of 2 drives. My reasoning was that putting the new
>>> drive in a good slot would reduce the I/O errors. Bad move, I know. I
>>> tried a few commands but was not able to continue the reshape.
>>>
>>
>> Nobody seems to have mentioned the reshape issue. What sort of reshape
>> were you running? How far into the reshape did it get? Do you have any
>> logs of the errors (which might at least indicate whereabouts in the
>> array things were before you pushed it over the edge)?
> These were the grow commands I ran:
> mdadm --add /dev/md1 /dev/sdf
> mdadm --grow --raid-devices=6 /dev/md1
> 
I found the kernel log output from when I ran the command:
[ 1912.303661] md: bind<sdf>
[ 1912.355423] RAID conf printout:
[ 1912.355426]  --- level:5 rd:5 wd:5
[ 1912.355428]  disk 0, o:1, dev:sda
[ 1912.355429]  disk 1, o:1, dev:sdb
[ 1912.355430]  disk 2, o:1, dev:sdd
[ 1912.355431]  disk 3, o:1, dev:sdc
[ 1912.355432]  disk 4, o:1, dev:sde
[ 1937.287333] RAID conf printout:
[ 1937.287341]  --- level:5 rd:6 wd:6
[ 1937.287347]  disk 0, o:1, dev:sda
[ 1937.287351]  disk 1, o:1, dev:sdb
[ 1937.287355]  disk 2, o:1, dev:sdd
[ 1937.287358]  disk 3, o:1, dev:sdc
[ 1937.287361]  disk 4, o:1, dev:sde
[ 1937.287365]  disk 5, o:1, dev:sdf
[ 1937.287469] md: reshape of RAID array md1
[ 1937.287475] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1937.287478] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[ 1937.287487] md: using 128k window, over a total of 5860391424k.
[ 1937.424014] ata6.00: exception Emask 0x10 SAct 0x20000 SErr 0x480100 action 0x6 frozen
[ 1937.424086] ata6.00: irq_stat 0x08000000, interface fatal error
[ 1937.424134] ata6: SError: { UnrecovData 10B8B Handshk }
[ 1937.424179] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.424227] ata6.00: cmd 61/40:88:00:dc:03/01:00:00:00:00/40 tag 17 ncq 163840 out
[ 1937.424227]          res 40/00:88:00:dc:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.424341] ata6.00: status: { DRDY }
[ 1937.424375] ata6: hard resetting link
[ 1937.743934] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1937.745491] ata6.00: configured for UDMA/133
[ 1937.745498] ata6: EH complete
[ 1937.751920] ata6.00: exception Emask 0x10 SAct 0xc00000 SErr 0x400100 action 0x6 frozen
[ 1937.751948] ata6.00: irq_stat 0x08000000, interface fatal error
[ 1937.751966] ata6: SError: { UnrecovData Handshk }
[ 1937.751982] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.751999] ata6.00: cmd 61/b8:b0:80:e2:03/02:00:00:00:00/40 tag 22 ncq 356352 out
[ 1937.751999]          res 40/00:b8:40:dd:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.752042] ata6.00: status: { DRDY }
[ 1937.752053] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.752070] ata6.00: cmd 61/40:b8:40:dd:03/05:00:00:00:00/40 tag 23 ncq 688128 out
[ 1937.752070]          res 40/00:b8:40:dd:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.752113] ata6.00: status: { DRDY }
[ 1937.752125] ata6: hard resetting link
[ 1938.072176] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1938.074013] ata6.00: configured for UDMA/133
[ 1938.074036] ata6: EH complete
etc.

The rest is lots and lots of I/O errors due to bad SATA cable.

> It got to roughly 15-17 % before I decided that the I/O errors were more
> scary than stopping the reshape.
>>
>>
>> What you'll have is one part of the array in one configuration, the
>> remaining part in another and no record of where that split begins.
> Like I said, ~15-17 % into the reshape.
>>
>> Regards,
>> Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html