On 02/07/2014 20:28, Pedro Teixeira wrote:
Hi Ethan,
The thing here is that some of the bad blocks ( if not all ) that are
giving read errors are not on the bad blocks list.
Are you sure? Please note that the offset is a complex topic because an
offset given by fsck will be a sector offset in the md0 sense, while the
device badblock list contains offset in the device sense, which means
that to convert one onto the other you have to divide, or multiply, by
the number of data disks, approximately, and handle the remainder
manually also considering the problem of the rotating parity. Not
simple. Is this the computation that you did?
Specifically, the ones that show up when doing a fsck are not on any
drive. For these sectors fsck tries to re-write then and md still
throws an error but they are not added to the list.
Not "added" but "removed". Writing to a bad block should create valid
content so they should be removed from the list. If they don't then
indeed there is probably a bug in the MD code, see my previous post.
I replaced sdm with a new disk. this was one that had a bunch or bad
blocks reported by md, and after finishing the rebuild ( with no
errors at all ) the --examine-badblocks still gives me the exact same
list of errors. I would expect that replacing the disk by a new one
would clear the errors.
This is the correct behaviour by design.
Source disks did not have valid content in those positions, so good data
cannot be created from nothing. Badblocks will be replicated onto the
new disk.
"Bad" here is more a synonym of "containing invalid data", not really
"unreadable surface".
as I know the disks are good, is there any way of reseting the bad
blocks list without destroying the filesystem?
This one I don't know but doing that would probably not help to find the
bug.
Regads
EW
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html