On 8/14/20 2:07 PM, Roy Sigurd Karlsbakk wrote:
I just tried another approach, mdadm --remove on the spares, mdadm --examine on
the removed spares, no superblock. Then madm --fail for one of the drives and
mdadm --add for another, now spare for a few milliseconds until recovery
started. This runs as it should, slower than --replace, but I don't care. After
12% or so, I checked with --examine-badblocks, and the same sectors are popping
up again. This was just a small test to see i --replace was the "bad guy" here
or if a full recovery would do the same. It does.
For the record, I just tested mdadm --replace again on a disk in the raid. The source disk had no badblocks. The destination disk is new-ish (that is, a few years old, but hardly written to and without an md superblock). It seems the badblocks present on other drives in the raid6 are also replicated to the "new" disk. This is not really how it should be IMO.
There must be a major bug in here somewhere. If there's a bad sector somewhere, well, ok, I can handle some corruption. The filesystem will probably be able to handle it as well. But if this is all blocked because of flakey "bad" sectors not really being bad, then something is bad indeed.
In my not-so-humble opinion, the bug is the existence of the BadBlocks
feature. Once a badblock is recorded for a sector, redundancy is
permanently lost at that location. There is no tool to undo this.
I strongly recommend that you remove badblock logs on all arrays before
the "feature" screws you.
Phil