All, I found a fan that wasn't working. This is 1u rack mount unit, so that fan not working apparently caused a lot of issues. I replaced the fan about 10 hours ago and I've done a bunch of different tests today. No disk errors reported in that time. I gave up on my previous array. I just deleted it and recreated it. I'm restoring from backup now. Thanks Greg On Tue, Dec 6, 2011 at 9:52 AM, Phil Turmel <philip@xxxxxxxxxx> wrote: > Hi Greg, > > On 12/06/2011 09:11 AM, Greg Freemyer wrote: >> Hmm... >> >> My rebuild failed. At first glance I had both a failed drive and a failed slot? >> >> What I don't understand is I have I/O errors in /var/log/messages from >> when the rebuild failed over night. > > Something in your system is untrustworthy. > >> But this morning, hdparm --read-sector is reading the "bad" sectors fine. > > What does smartctl say about your drives (all of them)? > >> I already tried replacing the drive and the replacement drive also >> reported media errors during the rebuild, that's why I came to believe >> I had a bad slot. >> >> Now I have non-repeatable media errors. >> >> fyi: I have the problem drive connected via eSata now, so it's a >> different controller totally than where it was when the failure first >> occurred. > > Are the errors in /var/log/messages only from that drive? If so, then that > drive is probably toast. > >> Any thoughts? > > Your prior e-mail said that you re-created the array. I didn't see that you > had definitively nailed down the problem at that point, so it probably wasn't > a good idea. In particular, it destroys all prior metadata on the array > members. If you didn't keep the output of "mdadm -E" for each drive, that > information is now lost. > > In general, "--create" is a last resort, and only to be used for recovery > when you have absolute confidence you understand the layout (mdadm -E > printouts of the original array). "--assemble --force" is the proper step > after "--assemble" fails. > > I would completely scrub the questionable drive with random data, run a long > smartctl test on it, and replace it if it reports any re-allocated sectors at > that point. > > I would also run long smartctl tests on the other drives, looking for pending > sectors or re-allocated sectors. If any, I would plan on replacements for > them as well, and would try to validate the content of your files. You do > have a backup to compare against, after all. > > If you are running a Debian-based distro, and the array contains your rootfs, > you might find "debsums" useful. > > HTH, > > Phil -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer CNN/TruTV Aired Forensic Imaging Demo - http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/ The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html