RE: RAID halting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> [ ... ] The evidence so far does not strongly suggest a
> hardware issue, at least not a drive issue, [ ... ]

> [ ... ] the drive system previously reported tons of sector
> remaps when the drives were in a different, clearly broken,
> enclosure, and they continue to do so on the 320G drive with
> known issues.

>> * Did you look into firmware? Are the drives and/or firmware
>>   revisions qualified by your controller vendor?

> Yes.  I did that before purchasing the controller.  No, I did not
> look into the drives.  The controller vendor does not qualify
> drives.  Controllers don't get any more generic than the one I
> purchased (I don't recall the brand at this time - it's based on
> the Silicon Image SiI3124 controller chip).

Uhhh, I'd invest in something else. Just in case. The SiL chips are
a bit low end, and most SiL based cards I have seeen were of the
cheap and cheerful variety, and those sometimes have fairly
marginal electrical/noise designs.

> More importantly, the fact the system ran for months without the
> problem, and the problem only occurred after changing the array
> chassis and the file system strongly suggests this is not the
> root of the issue.

Not necessarily: a different file system may trigger different bugs
in the host adapter fw and in the drive fw by doing operations in a
different sequence with different timing.

> [ ... ] "HOW DO I RUN A FULL BLOCK-LEVEL HARDWARE TEST?"

I agree that it seems unlikely that it is a physically defective
disk. More likely bad cabling, bad backplane, bad fw, bad
electrical/noise design.

Anyhow it is practically impossible on modern drives to run a full
black level hardware test on disk drives, which are more like block
servers, with several layers of interpolation between the command
level and the hardware.

However to run a *logical* block test, 'badblocks' from the
'e2fsprogs' package is the common choice.

But I'd leave running the CERN "silent corruption" daemon and other
checks/diagnostics and look carefully at the system logs for host
adapter errors.

For most people doing significant storage systems and self-built
systems of a certain size keeping current with the HEPiX workshops
<URL:https://WWW.HEPiX.org/> seems to me a good idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux