On Mon, Mar 16, 2009 at 11:04 AM, David Lethe <david@xxxxxxxxxxxx> wrote: >> -----Original Message----- >> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- >> owner@xxxxxxxxxxxxxxx] On Behalf Of David Greaves >> Sent: Monday, March 16, 2009 6:27 AM >> To: Neil Brown; linux-raid@xxxxxxxxxxxxxxx >> Subject: sync_action repair not reading all sectors? >> >> I have a drive that has bad sectors. Lots of them. >> >> smartctl shows >> # 1 Short offline Completed: read failure 20% 530 >> 1953520877 >> >> A simple ddrescue to this part of the disk gets this: >> >> Mar 16 10:41:28 elm kernel: [ 8643.123397] sd 3:0:0:0: [sdd] > 1953525168 >> 512-byte >> hardware sectors (1000205 MB) >> <snip<>51/40:00:f0:5c:70/00:00:74:00:00/e0 Emask 0x9 (media error) >> Mar 16 10:41:29 elm kernel: [ 8644.190060] ata4.00: status: { DRDY ERR >> } >> Mar 16 10:41:29 elm kernel: [ 8644.190099] ata4.00: error: { UNC } >> >> and reports 30 or so errors. >> >> >> mdstat tells me: >> md0 : active raid5 sdd1[0] sdb1[2] sda1[1] >> 1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> So sdd1 is in there. >> >> /dev/sdd1 is the full disk >> >> Now this is an enterprise class disk so I thought re-writing the > blocks >> would be >> worthwhile as a first step. (It is being RMAed but if it succeeds then >> I'll stop >> the array, mirror/replace the disk and start the array - less risky >> than a resync). >> >> However (two runs of) >> echo repair > /sys/block/md0/md/sync_action >> ran to completion without *any* errors being reported in syslog (or >> anywhere) >> >> Is this expected? It suggests that it isn't reading the bad parts of >> sdd. It >> certainly hasn't repaired it and I'm none the wiser... >> >> kernel is 2.6.26-1-xen-686 >> mdadm v2.6.7.2 >> >> >> PS >> This is an excellent place where I'd love to add in a new 'spare' > disk, >> mirror >> sdd to the new disk (apart from the bad sectors which should come from >> the >> array) and then swap new for old. >> Instead I'm going to have to go degraded and sync - risking a sector >> read >> failure on one of the other drives and a restore from backup :( >> >> -- >> "Don't worry, you'll be fine; I saw it work in a cartoon once..." >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" >> in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Personally, If I had a disk that just got this many bad sectors, then I > wouldn't mess > with it further. Just RMA it and get it out of your computer. Every > block of data > you write to that disk is at risk, and since you are running RAID5, then > you have no room > for error if this disk, or one of the others should die on you and you > have a bad block > on one of the surviving disks. > > David David, I think you read too fast. That is exactly what he proposed. The question was how to keep the raid-5 as fault tolerant as possible during the drive swapout. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html