Re: Two drives in RAID6 array experienced similar error at or near beginning of drive

Basil Mohamed Gohar <basilgohar@xxxxxxxxx> · Sat, 19 Jan 2019 09:57:59 -0500

On 1/19/19 9:02 AM, Wols Lists wrote:
On 19/01/19 13:30, Carsten Aulbert wrote:
Hi

On 1/19/19 2:21 PM, Basil Mohamed Gohar wrote:
I have two drives of the 4-array RAID6 visible, but no files are
accessible because it's a RAID6, I need at least 3 of the 4 drives
working, and my problem is two are experiencing this problem.
Hmm, that would be surprising, as RAID6 should offer a two disk
redundancy, i.e. any two disks may fail and you should still be able to
access your data - albeit without any extra safety net.
That was my reaction - raid6 should survive two drive failures. Although
I think *any* drive failure will result in the array failing to start
until you force it - if it's been running degraded it will restart in
the same configuration, if it degrades it won't restart without a force.
Check that out.
This is challenging because it is in a tower array and all the drives
connect straight to motherboard-like backplane.  I took one out and was
working with it directly via a USB SATA adapter, but that did not change
the errors I was seeing.
OK, I just wanted to make sure that the error "stayed" with the drives.

Yes, they are.  SMART reports no fatal errors on the drives in questions!
OK, at least that.

What may help me is if there are any tools for md devices that let me
peek into the on-disk structure.  Since the ext4 file system is spread
across the 3 data drives in the array, I cannot use, for example, e2fsck
on just one of them, and since I cannot properly assemble the drive, I
am somewhat stuck.  Are there any tools for examining an array of drives
even if it is not recognized as such? I don't know, for example, if some
sectors went bad, how to tell mdadm to look in alternate locations
(i.e., akin to ext4's alternative superblock locations).
As indicated above with RAID6 you should "only" have two data disks in a
four disk RAID6, as RAID6 does not write data copies but "generated"
parity stripes to the two extra disks, it can compute back what should
have been on data stripes on failed disks. But reverse engineering this
is probably not really easy to perform "manually".

Thus, at first, we should really establish what the underlying layout
was, i.e. can you send us the output of /proc/mdstat?
Might be too late for that. Two tools that are probably useful are Phil
Turmel's lsdrv, and I saw wipefs mentioned a few days ago here - there's
an option to do nothing that just gives you info.

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

Cheers,
Wol

You are all correct.  It could not be a RAID6.  I just read about the 
two modes and I am sure I did RAID5 across 4 drives, because I had 24TB 
of total space from 32TB of total drives (4 8TB drives). I recall 
expecting 1 drive worth of resiliency, not two, hence my predicament.