On 29/10/2019 19:05, Anssi Hannula wrote:
As mentioned in my first message and seen in
http://onse.fi/files/reshape-infloop-issue/examine-all.txt , the MD bad
block lists contain blocks (suspiciously identical across devices).
So maybe the code can't properly handle the case where 10 devices have
the same block in their bad block list. Not quite sure what "handle"
should mean in this case but certainly something else than a
handle_stripe() loop :)
There is a "bad" block on 10 devices on sector 198504960, which I guess
matches sh->sector 198248960 due to data offset of 256000 sectors (per
--examine).
I've wondered if "dd if=/dev/md0 of=/dev/md0" for the affected blocks
would clear the bad blocks and avoid this issue, but I haven't tried
that yet so that the infinite loop issue can be investigated/fixed
first. I already checked that /dev/md0 is fully readable (which also
confuses me a bit since md(8) says "Attempting to read from a known bad
block will cause a read error"... maybe I'm missing something).
Hmmm ...
Bear in mind that bad-blocks is considered by many an anti-feature, and
it's strongly suspected that identical bad-block lists across multiple
disks is a bug ...
I hesitate to suggest trying to clear the bad-blocks but doing a dd will
definitely not do what you want - the md bad blocks list is implemented
within the md layer, so doing something with dd is unlikely to touch it.
Plus, as a software implementation, you should NEVER under normal
circumstances have any bad blocks - it doesn't make sense - so it's
pretty certain you've fallen foul of a bug in the bad blocks setup.
Sorry I can't offer any solutions, other than very hesitantly suggesting
just a --remove-badblocks --force or whatever the option is.
Hopefully this gives you a few ideas ...
Cheers,
Wol