> -----Original Message----- > From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- > owner@xxxxxxxxxxxxxxx] On Behalf Of Tony Coffman > Sent: Tuesday, August 26, 2008 10:33 AM > To: linux-raid@xxxxxxxxxxxxxxx > Subject: Device naming and raid1 > > I've have a Centos5 box running a software raid-1 set on a pair of SATA > drives. > > The SATA controller or driver has a flaw. > Every 150 days or so, one of the two drives will experience errors and > fail. > > Subsequent tests always show the drive and cable to be ok. We bought a > couple of replacement drives before we figured that out :-( > > On the last event this weekend, I went searching for a way to get the > raid back online with no host downtime. I found the technique that > deletes the drive and then brings it back online with a bus scan using > the /sys filesystem delete and rescan entities. > > I didn't realize that you could also perform a rescan on a single LUN. > I'll have to use that next time. > > My question - since I've done a delete/rescan bus operation, my device > name and major,minor numbers have changed. > > Original > [0:0:0:0] disk ATA ST3250410AS 3.AA /dev/sdc > > Current > [0:0:0:0] disk ATA ST3250410AS 3.AA /dev/sdc > > If I re-add the device to the raid set using the new device name, will > it cause any problems on the next boot? > > The drive appears to be fine. I can read all blocks with no errors. > Partition table looks ok, etc.. > > In the future if I rescan just the single LUN, I'm pretty sure I won't > run into again this but I'd like to avoid an outage on this event if > possible. > > Thanks and regards, > --Tony > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.htm Don't be too quick to say the drive(s) are good, or for that matter, making any assumptions about what is bad or good. (Well, OK, let's assume the monitor is good). If the drives are reporting errors and the drives fail, why not trap the error messages and do some diagnostics while drives are still in that failed state? Error messages tell you what the errors are. Make yourself a bootable CDROM or USB and next time the drives lockup and/or start spitting out errors, then capture everything. Then boot to the external device (do NOT cycle power), and run one of many possible diagnostics to confirm or eliminate the disks. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html