Re: Two drives in RAID6 array experienced similar error at or near beginning of drive

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Sat, 19 Jan 2019 14:02:47 +0000

On 19/01/19 13:30, Carsten Aulbert wrote:
> Hi
> 
> On 1/19/19 2:21 PM, Basil Mohamed Gohar wrote:
>> I have two drives of the 4-array RAID6 visible, but no files are
>> accessible because it's a RAID6, I need at least 3 of the 4 drives
>> working, and my problem is two are experiencing this problem.
> 
> Hmm, that would be surprising, as RAID6 should offer a two disk
> redundancy, i.e. any two disks may fail and you should still be able to
> access your data - albeit without any extra safety net.

That was my reaction - raid6 should survive two drive failures. Although
I think *any* drive failure will result in the array failing to start
until you force it - if it's been running degraded it will restart in
the same configuration, if it degrades it won't restart without a force.
Check that out.
> 
>> This is challenging because it is in a tower array and all the drives
>> connect straight to motherboard-like backplane.  I took one out and was
>> working with it directly via a USB SATA adapter, but that did not change
>> the errors I was seeing.
> 
> OK, I just wanted to make sure that the error "stayed" with the drives.
> 
>> Yes, they are.  SMART reports no fatal errors on the drives in questions!
> 
> OK, at least that.
> 
>> What may help me is if there are any tools for md devices that let me
>> peek into the on-disk structure.  Since the ext4 file system is spread
>> across the 3 data drives in the array, I cannot use, for example, e2fsck
>> on just one of them, and since I cannot properly assemble the drive, I
>> am somewhat stuck.  Are there any tools for examining an array of drives
>> even if it is not recognized as such? I don't know, for example, if some
>> sectors went bad, how to tell mdadm to look in alternate locations
>> (i.e., akin to ext4's alternative superblock locations).
> 
> As indicated above with RAID6 you should "only" have two data disks in a
> four disk RAID6, as RAID6 does not write data copies but "generated"
> parity stripes to the two extra disks, it can compute back what should
> have been on data stripes on failed disks. But reverse engineering this
> is probably not really easy to perform "manually".
> 
> Thus, at first, we should really establish what the underlying layout
> was, i.e. can you send us the output of /proc/mdstat?

Might be too late for that. Two tools that are probably useful are Phil
Turmel's lsdrv, and I saw wipefs mentioned a few days ago here - there's
an option to do nothing that just gives you info.

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

Cheers,
Wol