Re: Assemblin journaled array fails

Michal Soltys <msoltyspl@xxxxxxxxx> · Wed, 8 Jul 2020 13:29:12 +0200

On 7/7/20 12:08 AM, Song Liu wrote:

So, what kind of next step after this ?

Sorry for the delay. I read the log again, and found the following
line caused this issue:

[ +16.088243] r5l_write_super_and_discard_space set MD_SB_CHANGE_PENDING

The attached patch should workaround this issue. Could you please give it a try?

Yea, this solved the issue - the raid assembled correctly (so the patch 
is probably a good candidate for lts kernels).

Thanks for helping with this bug.

Underlying filesystems are mountable/usable as well - albeit read-only 
fsck (ext4) or btrfs check do find some minor issues; tough to say at 
this point what was the exact culprit.

In this particular case - imho - one issue remains: the assembly is 
slower than full resync (without bitmap), which outside of some 
performance gains (writeback journal) and write-hole fixing - kind of 
completely defeats the point of having such resync policy in the first 
place.

dmesg -H | grep r5c_recovery_flush_log

[ +13.550877] r5c_recovery_flush_log processing ctx->seq 860700000
[Jul 7 15:16] r5c_recovery_flush_log processing ctx->seq 860800000
[Jul 7 15:40] r5c_recovery_flush_log processing ctx->seq 860900000
...
[Jul 8 06:40] r5c_recovery_flush_log processing ctx->seq 866300000
[Jul 8 06:58] r5c_recovery_flush_log processing ctx->seq 866400000
[Jul 8 07:20] r5c_recovery_flush_log processing ctx->seq 866500000

During those periods when I was testing your patches, the machine has 
always been basically idle - no cpu/io/waits, or anything that could 
hamper it. The read process going from the journal device (ssds) was 
averaging 1-4 mb/s.