Since they both went off line at the same time, check the power cables. Do they share a common power cable, or doe each have a unique cable directly from the power supply. Switch power connections with another drive to see if the problem stays with the power connection. Guy -----Original Message----- From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Michael Stumpf Sent: Thursday, December 09, 2004 9:45 AM To: Guy; linux-raid@xxxxxxxxxxxxxxx Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years All I see is this: Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or command retry failed after host reset: host 1 channel 0 id 2 lun 0 Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or command retry failed after host reset: host 1 channel 0 id 3 lun 0 Apr 14 22:03:56 drown kernel: md: updating md1 RAID superblock on device Apr 14 22:03:56 drown kernel: md: (skipping faulty sdj1 ) Apr 14 22:03:56 drown kernel: md: (skipping faulty sdi1 ) Apr 14 22:03:56 drown kernel: md: sdh1 [events: 000000b5]<6>(write) sdh1's sb offset: 117186944 Apr 14 22:03:56 drown kernel: md: sdg1 [events: 000000b5]<6>(write) sdg1's sb offset: 117186944 Apr 14 22:03:56 drown kernel: md: recovery thread got woken up ... Apr 14 22:03:56 drown kernel: md: recovery thread finished ... What the heck could that be? Can that possibly be related to the fact that there weren't proper block device nodes sitting in the filesystem?! I already ran WD's wonky tool to fix their "DMA timeout" problem, and one of the drives is a maxtor. They're on separate ATA cables, and I've got about 5 drives per power supply. I checked heat, and it wasn't very high. Any other sources of information I could tap? Maybe an "MD debug" setting in the kernel with a recompile? Guy wrote: >You should have some sort of md error in your logs. Try this command: >grep "md:" /var/log/messages*|more > >Yes, they don't play well together, so separate them! :) > >Guy > >-----Original Message----- >From: linux-raid-owner@xxxxxxxxxxxxxxx >[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Michael Stumpf >Sent: Wednesday, December 08, 2004 11:46 PM >To: linux-raid@xxxxxxxxxxxxxxx >Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years > >No idea what failure is occuring. Your dd test, run from begin to end >of each drive, completed fine. Smartd had no info to report. > >The fdisk weirdness was operator error; the /dev/sd* block nodes were >missing (forgotten detail on age old upgrade). Fixed with mknod. > >So, I forced mdadm to assemble and it is reconstructing now. >Troublesome, though, that 2 drives fail at once like this. I think I >should separate them to different raid-5s, just incase. > > > >Guy wrote: > > > >>What failure are you getting? I assume a read error. md will fail a drive >>when it gets a read error from the drive. It is "normal" to have a read >>error once in a while, but more than 1 a year may indicate a drive going >>bad. >> >>I test my drives with this command: >>dd if=/dev/hdi of=/dev/null bs=64k >> >>You may look into using "smartd". It monitors and tests disks for >> >> >problems. > > >>However, my dd test finds them first. smartd has never told me anything >>useful, but my drives are old, and are not smart enough for smartd. >> >>Guy >> >>-----Original Message----- >>From: linux-raid-owner@xxxxxxxxxxxxxxx >>[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Michael Stumpf >>Sent: Wednesday, December 08, 2004 4:03 PM >>To: linux-raid@xxxxxxxxxxxxxxx >>Subject: 2 drive dropout (and raid 5), simultaneous, after 3 years >> >> >>I've got a an LVM cobbled together of 2 RAID-5 md's. For the longest >>time I was running with 3 promise cards and surviving everything >>including the occasional drive failure, then suddenly I had double drive >>dropouts and the array would go into a degraded state. >> >>10 drives in the system, Linux 2.4.22, Slackware 9, mdadm v1.2.0 (13 mar >>2003) >> >>I started to diagnose; fdisk -l /dev/hdi returned nothing for the two >>failed drives, but "dmesg" reports that the drives are happy, and that >>the md would have been automounted if not for a mismatch on the event >>counters (of the 2 failed drives). >> >>I assumed that this had something to do with my semi-nonstandard >>application of a zillion (3) promise cards in 1 system, but I never had >>this problem before. I ripped out the promise cards and stuck in 3ware >>5700s, cleaning it up a bit and also putting a single drive per ATA >>channel. Two weeks later, the same problem crops up again. >> >>The "problematic" drives are even mixed; 1 is WD, 1 is Maxtor (both >> >> >120gig). > > >>Is this a known bug in 2.4.22 or mdadm 1.2.0? Suggestions? >> >> >>-------------------------------------------- >>My mailbox is spam-free with ChoiceMail, the leader in personal and >>corporate anti-spam solutions. Download your free copy of ChoiceMail from >>www.choicemailfree.com >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@xxxxxxxxxxxxxxx >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@xxxxxxxxxxxxxxx >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >> > > >-------------------------------------------- >My mailbox is spam-free with ChoiceMail, the leader in personal and >corporate anti-spam solutions. Download your free copy of ChoiceMail from >www.choicemailfree.com > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -------------------------------------------- My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html