Dan Williams wrote: > On Mon, Mar 16, 2009 at 4:27 AM, David Greaves <david@xxxxxxxxxxxx> wrote: >> I have a drive that has bad sectors. Lots of them. >> >> smartctl shows >> # 1 Short offline Completed: read failure 20% 530 >> 1953520877 >> >> A simple ddrescue to this part of the disk gets this: >> >> Mar 16 10:41:28 elm kernel: [ 8643.123397] sd 3:0:0:0: [sdd] 1953525168 512-byte >> hardware sectors (1000205 MB) >> <snip<>51/40:00:f0:5c:70/00:00:74:00:00/e0 Emask 0x9 (media error) >> Mar 16 10:41:29 elm kernel: [ 8644.190060] ata4.00: status: { DRDY ERR } >> Mar 16 10:41:29 elm kernel: [ 8644.190099] ata4.00: error: { UNC } >> >> and reports 30 or so errors. >> >> >> mdstat tells me: >> md0 : active raid5 sdd1[0] sdb1[2] sda1[1] >> 1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> So sdd1 is in there. >> >> /dev/sdd1 is the full disk >> > > Are you sure? Maybe I did the following math wrong, but it seems > there is a chance this bad region is outside the raid array. > /proc/mdstat says the array is 1953519872 blocks large which is > 3907039744 sectors. For a three disk raid5 that means we are using > 1953519872 sectors per disk. The failing sector of 1953520877 is 1005 > sectors outside the array, probably 942 assuming partition 1 starts at > sector 63?? > > -- > Dan Thanks for taking the time to look and for spotting this Dan. Well you are right. The media error is occurring outside the partition. But equally: yes, it's the full disk according to cfdisk,fdisk I *knew* that I'd allocated the full disk to the partition and checked at a cursory level but not at a sector level :( Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes /dev/sdd1 1 121601 976760001 83 Linux 1 Primary 0 1953520064 63 1953520065 Linux (83) None but kernel.log says: sd 3:0:0:0: [sdd] 1953525168 512-byte hardware sectors (1000205 MB) So I humbly apologise for doubting md :) Pragmatically it looks like a genuine disk error but I should be OK to recover by stopping the array and doing a fast ddrescue mirror on this device rather than a more risky replace/resync now the advance replacement has arrived. Shame we can't do that without stopping the array yet ;) David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html