Colin McCabe wrote:
Hi all,
I am running software RAID on Linux 2.6.21.
While experimenting with adding and removing devices from the RAID
array, I
noticed something very troubling. I have a bad drive (let's call it
drive B)
which gets random read errors. I also have a good drive, call it drive A.
B can synchronize with A. But then, if I remove A from the raid array, A
cannot be re-added. This is because the bad drive, B, cannot be read
from.
Basically, B appears to be "write-only"; it will never return an error
on a
write, but just try to read from it, and you will be sorry.
You may be able to recover from this (why would you do such a thing?) by
stopping the array and reassembling the array with only the "good" drive
and the other as failed. Caution, I made this up, it should work but I
have no bad drive to use for a test, we have a good recycling system in
my area.
Writing is fine:
[root@cmccabe-devel root]# dd if=/dev/zero of=/dev/sdb bs=524288
dd: writing `/dev/sdb': No space left on device
114464+0 records in
114463+0 records out
Reading is not:
[root@cmccabe-devel root]# dd if=/dev/sdb of=/dev/null bs=524288
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x2 frozen
ata1.00: cmd 60/00:00:00:b0:01/01:00:00:00:00/40 tag 0 cdb 0x0 data
131072 in
[ ... copious errors ... ]
I have disabled write caching using hdparm -W0.
Both drives are: Fujitsu MHV2060BH, 60 GB, Serial ATA
The SATA controller is: ICH6
My problem is that even though B gets into the synchronized state, it
is no
good at all. This is potentially misleading, and if someone removes A
after
synchronizing B, the system will probably crash, since there will be
no good
drives left.
I wonder if anyone else is interested in a "paranoid recovery" mode
where the
md layer tests the data that has been written. Even if this doubles the
recovery time, I think that it would be desirable for many applications.
--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html