Re: Bad blocks are killing us!

Dieter Stueken <stueken@xxxxxxxxxxx> · Mon, 22 Nov 2004 09:22:06 +0100

Guy Watkins wrote:
... but the md-level
approach might be better.  But I'm not sure I see the point of
it---unless you have raid 6 with multiple parity blocks, if a disk
actually has the wrong information recorded on it I don't think you
can detect which drive is bad, just that one of them is."

If there is a parity block that does not match the data, true you do not
know which device has the wrong data.  However, if you do not "correct" the
parity, when a device fails, it will be constructed differently than it was
before it failed.  This will just cause more corrupt data.  The parity must
be made consistent with whatever data is on the data blocks to prevent this
corrosion of data.  With RAID6 it should be possible to determine which
block is wrong.  It would be a pain in the @$$, but I think it would be
doable.  I will explain my theory if someone asks.

This is exactly the same conflict, a single drive has with a unreadable sector.
It notices the sector as bad, and it can not fulfill any read request, until
the data is not rewritten or erased. The single drive can not (and should never
try to!) silently replace the bad sector by some spare sectors, as it can not
recover the content.

Also the RAID system can not solve this problem automagically, and never should
do so, as the former content can not be deduced any more. But notice, that we
have two very different problems to examine: The above problem arises, if all
disks of the RAID system claim to read correct data, whereas the parity information
tells us, that one of them must be wrong. As long as we don't have RAID6,
to recover single bit errors, the data is LOST and can not be recovered.

This is very different to the situation, when one of the disks DOES reports
an internal crc-error. In this case your data CAN be recovered reliable from the
parity information, and in most cases successfully written back to the disk.

But there is also a difference between the problem for RAID compared to the internal
disk: Whereas the disk always reads all CRC data for the sector to verify its integrity,
the RAID system does not normally check the validity of the parity information
by default. (this is, why the idea of data scans actually came up). So, if a scan
discovers a bad parity information, the only action that can (and must!) be taken
is, to tag this piece of data to be invalid. And it is very important, not only
to log that information somewhere. It is even more important to prevent further readings
of this piece of lost data. Otherwise those definitely invalid data may be read
without any notice again, may even get written back again and thus turns into valid
data, even though it become garbage.

People oftenargue for some spare sector management, which would solve all problems.
I think this is an illusion. Spare sectors can only be useful if you fail WRITING data,
not when reading data failed or data loss occurred. This is realized already within
the single disks in a sufficient way (I think). If your disk gives write errors, you
either have a very old one, without internal spare sector management, or your disk
run out of spare sectors already. Read errors are quite more frequent than write
errors and thus a much more important issue.

Dieter Stüken.
--
Dieter Stüken, con terra GmbH, Münster
    stueken@xxxxxxxxxxx
    http://www.conterra.de/
    (0)251-7474-501
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html