Re: hung grow

Curt <lightspd@xxxxxxxxx> · Wed, 4 Oct 2017 16:01:11 -0400

Hello,

Thanks for clarifying. All current good drives report that they are
part of a 8 drive array.  I only grew the raid by 1 device, so it
would from 7-8, which is what they all report.  The 3rd failed doesn't
report anything on examine, I haven't touched it at all and was not
included in my assemble.  The 2 I replaced, the original drives I
yanked,  think they are still part of a 7 drive array.

I'll be doing a ddrescue on the drives tonight, but will wait till
Phil or someone chimes in with my next steps after I do that.

lol, chalk one more up for FML. "SCT Error Recovery Control command
not supported".  I'm guessing this is a real bad thing now?  I didn't
buy these drives or org set it up.

On Wed, Oct 4, 2017 at 3:46 PM, Anthony Youngman
<antlists@xxxxxxxxxxxxxxx> wrote:
> On 04/10/17 20:09, Curt wrote:
>>
>> Ok, thanks.
>>
>> I'm pretty sure I'll be able to DD from at least one of the failed
>> drives, as I could still query them before I yanked them.  Assuming I
>> can DD one of the old drives to one of my new ones.
>>
>> I'd DDrescue old to new drive. Then do an assemble for force, with a
>> mix of the dd drives and my old good ones? So if sda/b are new DD'd
>> drives and sdc/d/e are hosed grow drives, I'd do an assemble force
>> revert-reshape /dev/md127 sda sdb sdc sdd and sde? Then assemble can
>> use my info from the DD drives to assemble the array back to 7 drives?
>>   Did I understand that right?
>
>
> This sounds like you need to take a great big step backwards, and make sure
> you understand EXACTLY what is going on. We have a mix of good drives,
> copies of bad drives, and an array that doesn't know whether it is supposed
> to have 7 or 9 drives. One wrong step and your array will be toast.
>
> You want ALL FOUR KNOWN GOOD DRIVES. You want JUST ONE ddrescue'd drive.
>
> But I think the first thing we need to do, is to wait for an expert like
> Phil to chime in and sort out that reshape. Your four good drives all think
> they are part of a 9-drive array. Your first two drives to fail think they
> are part of a 7-drive array. Does the third drive think it's part of a
> 7-drive or 9-drive array?
>
> Can you do a --examine on this drive? I suspect the grow blew up because it
> couldn't access this drive. I this drive thinks it is part of a 7-drive
> array, we have a bit of a problem on our hands.
>
> I'm hoping it thinks it's part of a 9-drive array - I think we may be able
> to get out of this ...
>>
>>
>> Oh and how can I tell if I have a timeout mismatch.  They should be raid
>> drives.
>
>
> smartctl -x /dev/sdX
>
> This will give you both the sort of drive you have - yes if it's in a
> datacentre chances are it is a raid drive - and then search the output for
> Error Recovery Control. This is from my hard drive...
>
> SCT capabilities:              (0x003f) SCT Status supported.
>                                         SCT Error Recovery Control
> supported.
>                                         SCT Feature Control supported.
>                                         SCT Data Table supported.
>
> You need error recovery to be supported. If it isn't ...
>
>>
>> Cheers,
>> Curt
>
>
> Cheers,
> Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html