Re: Two drives in RAID6 array experienced similar error at or near beginning of drive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/19/19 9:02 AM, Wols Lists wrote:
On 19/01/19 13:30, Carsten Aulbert wrote:
Hi

On 1/19/19 2:21 PM, Basil Mohamed Gohar wrote:
I have two drives of the 4-array RAID6 visible, but no files are
accessible because it's a RAID6, I need at least 3 of the 4 drives
working, and my problem is two are experiencing this problem.
Hmm, that would be surprising, as RAID6 should offer a two disk
redundancy, i.e. any two disks may fail and you should still be able to
access your data - albeit without any extra safety net.
That was my reaction - raid6 should survive two drive failures. Although
I think *any* drive failure will result in the array failing to start
until you force it - if it's been running degraded it will restart in
the same configuration, if it degrades it won't restart without a force.
Check that out.
This is challenging because it is in a tower array and all the drives
connect straight to motherboard-like backplane.  I took one out and was
working with it directly via a USB SATA adapter, but that did not change
the errors I was seeing.
OK, I just wanted to make sure that the error "stayed" with the drives.

Yes, they are.  SMART reports no fatal errors on the drives in questions!
OK, at least that.

What may help me is if there are any tools for md devices that let me
peek into the on-disk structure.  Since the ext4 file system is spread
across the 3 data drives in the array, I cannot use, for example, e2fsck
on just one of them, and since I cannot properly assemble the drive, I
am somewhat stuck.  Are there any tools for examining an array of drives
even if it is not recognized as such? I don't know, for example, if some
sectors went bad, how to tell mdadm to look in alternate locations
(i.e., akin to ext4's alternative superblock locations).
As indicated above with RAID6 you should "only" have two data disks in a
four disk RAID6, as RAID6 does not write data copies but "generated"
parity stripes to the two extra disks, it can compute back what should
have been on data stripes on failed disks. But reverse engineering this
is probably not really easy to perform "manually".

Thus, at first, we should really establish what the underlying layout
was, i.e. can you send us the output of /proc/mdstat?
Might be too late for that. Two tools that are probably useful are Phil
Turmel's lsdrv, and I saw wipefs mentioned a few days ago here - there's
an option to do nothing that just gives you info.

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

Cheers,
Wol

Thanks.  wipefs provided some information that I think may be helpful.  The two still-reporting drives in the array report as follows:

wipefs /dev/sdi
DEVICE OFFSET        TYPE UUID                                 LABEL
sdi    0x1000        linux_raid_member cd6470cb-1aa3-03fd-1027-706e5fd0606d alpha.hidayahonline.net:3
sdi    0x74702555e00 gpt
sdi    0x1fe         PMBR
wipefs /dev/sdg
DEVICE OFFSET        TYPE UUID                                 LABEL
sdg    0x1000        linux_raid_member cd6470cb-1aa3-03fd-1027-706e5fd0606d alpha.hidayahonline.net:3
sdg    0x74702555e00 gpt
sdg    0x1fe         PMBR
For the two drives that are not (sdc & sde), I get nothing but these same errors (which I mentioned earlier) in dmesg:

[118345.203138] sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [118345.203154] sd 2:0:0:0: [sdc] tag#0 Sense Key : Hardware Error [current] [118345.203158] sd 2:0:0:0: [sdc] tag#0 Add. Sense: Internal target failure [118345.203162] sd 2:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
[118345.203165] print_req_error: critical target error, dev sdc, sector 8
[118345.328209] sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [118345.328217] sd 2:0:0:0: [sdc] tag#0 Sense Key : Illegal Request [current]
[118345.328222] sd 2:0:0:0: [sdc] tag#0 Add. Sense: Invalid field in cdb
[118345.328228] sd 2:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
[118345.328232] print_req_error: critical target error, dev sdc, sector 8
[118345.328240] Buffer I/O error on dev sdc, logical block 1, async page read [118347.813267] sd 3:0:0:1: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[118347.813274] sd 3:0:0:1: [sde] tag#0 Sense Key : Medium Error [current]
[118347.813279] sd 3:0:0:1: [sde] tag#0 Add. Sense: Unrecovered read error
[118347.813285] sd 3:0:0:1: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
[118347.813288] print_req_error: critical medium error, dev sde, sector 8
[118347.821409] sd 3:0:0:1: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[118347.821415] sd 3:0:0:1: [sde] tag#0 Sense Key : Medium Error [current]
[118347.821418] sd 3:0:0:1: [sde] tag#0 Add. Sense: Unrecovered read error
[118347.821422] sd 3:0:0:1: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
[118347.821425] print_req_error: critical medium error, dev sde, sector 8
[118347.821430] Buffer I/O error on dev sde, logical block 1, async page read
My inexperienced suspicious is I have some badblocks in a critical portion of the drives where some magic numbers should reside, so they appear as "empty" to the system.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux