I have had this same problem. The funny thing is it could be fixed, but I bet it is very hard to do. With most or all modern disk drives (10 years old or less) if you write to the bad block the disk drive will re-locate the bad block. The RAID5 software could do this: Read bad block, get a failure. Re-create missing data. Write missing data back over the bad block. If success then go on with life! Else Report the disk as needing to be replaced, but don't fail it for 1 bad block! Maybe have a threshold. After all 99.99999% of the data is still there! I have "corrected" disks with bad blocks by using dd to copy /dev/zero to the disk. After that test the disk by copying the disk to /dev/null. Works every time. Example: /dev/sdf has a bad block. And you are willing to loose the data on it! dd if=/dev/zero of=/dev/sdf bs=64k If success then dd if=/dev/sdf of=/dev/null bs=64k If success then The disk is good as... well it has not bad blocks for now. If a disk has a bad block in 1 partition you could just dd zero to that partition, but still verify 100% of the disk. I have corrected about 3 disks this way in the past 3 years. I have never had any issues since then. So I know the raid software could automate this and save some major headaches! One gotcha, my disks had auto re-locate disabled. I install a Seagate tool that allowed me to adjust disk drive options. I enabled auto re-locate for read and write. Since then I have not had a read error. I think the drive re-locates blocks on reads if there is a retry on read. Of course it can't re-locate the block if it can't read it. A note about hardware RAID. Hardware RAID systems will test the disks from time to time. So the bad block will be found at a time that you don't need it. The chances of having 2 bad blocks on different drives is reduced much by this extra scanning. I use a crontab script to read my disks each night. It sends me an email status. This way I stand a good chance of knowing about a bad block before md finds it. Guy -----Original Message----- From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of AndyLiebman@xxxxxxx Sent: Friday, May 21, 2004 9:05 AM To: linux-raid@xxxxxxxxxxxxxxx Subject: Hardware vs Software and Bad Block Relocation >From the replies I got to my last question about Hardware versus Software RAID, one of the big advantages of true hardware RAID can be the better handling of bad blocks or read errors on RAID 1 and RAID 5 arrays. I have encountered this situation a few times with Linux software RAID 5 where I will get a read error on a particular sector of a particular disk. Linux software RAID will immediately throw this disk out of the array. And now, if I get a read error on another disk before I replace the first disk (unlikely but it did happen to me once -- about a day after getting the first error), the array can be totally lost. Or at least it's not so obvious how to recover the data. Yesterday, I spoke with two tech support people at 3ware who explained that their hardware RAID cards will remember where a read error is encountered and next time you try to write to that sector the data will get relocated to another sector instead. As long as there is still communication with the disk after a read error (within 20 seconds) the disk won't get kicked out of the array and the RAID won't go into degraded mode. An error report will get generated that you can view in the 3ware 3dm or 3dm2 GUI interface -- so you can see that you MIGHT have to start worrying about a particular disk. But the data will still be intact and the array will still offer redundancy. This seems like a HUGE advantage to data security -- especially in my application. I am dealing with Terrabytes of video and audio files, and it's simply not practical to back them up. So, my question is, is there a "software equivalent" to what the 3ware card does with bad sectors or bad blocks. Will EVMS do that? Will the latest LVM do that? I have read that EVMS does have a bad block relocation function, but does it work the same way as the 3ware card? Will it prevent an array from going into degraded mode after a read error? - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html