On Tue Dec 06, 2011 at 09:11:24AM -0500, Greg Freemyer wrote: > Hmm... > > My rebuild failed. At first glance I had both a failed drive and a failed slot? > > What I don't understand is I have I/O errors in /var/log/messages from > when the rebuild failed over night. > > But this morning, hdparm --read-sector is reading the "bad" sectors fine. > > I already tried replacing the drive and the replacement drive also > reported media errors during the rebuild, that's why I came to believe > I had a bad slot. > > Now I have non-repeatable media errors. > > fyi: I have the problem drive connected via eSata now, so it's a > different controller totally than where it was when the failure first > occurred. > > Any thoughts? > Last time I had this sort of issue, it was down to the motherboard. Somewhere between the drives and the CPU, one or more of the chipsets were causing issues (I actually had the same issue on multiple motherboards, though I think using the same/similar onboard SATA controllers). Single drive tests worked fine - it was only when hammering the entire array that it would get a write error and fail a random drive. I've since bought a proper SAS/SATA PCIe card (Intel RS2WC080) and have had no issues since. The other things I can think of that may cause this type of issue are a flaky PSU, or physical shock to the server chassis (even relatively small movements can cause read/write slowdowns/errors - there's a video clip online of someone just shouting in front of a rack and causing the transfer speed to drop off dramatically). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@xxxxxxxxxxxxxxx> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" |
Attachment:
pgpyuv1tNRyvA.pgp
Description: PGP signature