Re: Assemblin journaled array fails

Song Liu <song@xxxxxxxxxx> · Thu, 28 May 2020 08:06:37 -0700

On Thu, May 28, 2020 at 3:18 AM Michal Soltys <msoltyspl@xxxxxxxxx> wrote:
>
> On 5/27/20 3:36 PM, Michal Soltys wrote:
> > On 5/27/20 1:37 AM, Song Liu wrote:
> >>
[...]

> To expand on the above (mdadm was started yesterday):
>
> - it went on for ~19+ hours - like previously
> - the last piece in trace output:
>
> xs22:/home/msl☠ tail -n 4 trace.out ; echo
> 09:08:12 40739   40739   mdadm           r5l_recovery_read_page 1852d8b0
> 09:08:12 40739   40739   mdadm           r5l_recovery_read_page 1852d8b8
> 09:08:12 40739   40739   mdadm           r5l_recovery_read_page 1852d8c0
> 09:08:12 40739   40739   mdadm           r5l_recovery_read_page 1852d8c8
>
> So it went through ~120gb of journal (the journal device is 256gb).

This (processing more than 10GB) should not happen. So there must be
something wrong...

> Non-assembling raid issue aside (assuming it's unrelated) - why does it
> take such enormous amount of time ? That's average of like 1.7mb/s

This is slow because we are processing 4kB at a time. But it is slower than
my expectation. What are the speeds measured at boot? Something like the
following in dmesg:

[    3.851064] raid6: avx2x4   gen() 17328 MB/s
[    3.877404] raid6: avx2x4   xor() 11033 MB/s
[    3.903410] raid6: avx2x2   gen() 14710 MB/s
[    3.929410] raid6: avx2x2   xor()  8921 MB/s
[    3.955408] raid6: avx2x1   gen() 12687 MB/s
[    3.981405] raid6: avx2x1   xor()  8414 MB/s
[    4.007409] raid6: sse2x4   gen()  9105 MB/s
[    4.033403] raid6: sse2x4   xor()  5724 MB/s
[    4.059408] raid6: sse2x2   gen()  7570 MB/s
[    4.085410] raid6: sse2x2   xor()  4880 MB/s
[    4.111409] raid6: sse2x1   gen()  6484 MB/s
[    4.137402] raid6: sse2x1   xor()  4412 MB/s

For the next step, could you please try the following the beginning of
--assemble
run?

   trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)'

Thanks,
Song

PS: Some key functions here are inlined and thus not easily traceable. We will
probably need to patch the kernel for further debugging.

>
> Around 09:08 today, the recovery line popped in the dmesg as previously:
>
> [May27 12:58] md: md124 stopped.
> [  +0.006779] md/raid:md124: device sdf1 operational as raid disk 0
> [  +0.006929] md/raid:md124: device sdj1 operational as raid disk 3
> [  +0.006903] md/raid:md124: device sdi1 operational as raid disk 2
> [  +0.006914] md/raid:md124: device sdh1 operational as raid disk 1
> [  +0.007426] md/raid:md124: raid level 5 active with 4 out of 4
> devices, algorithm 2
> [May28 09:08] md/raid:md124: recovering 24667 data-only stripes and
> 50212 data-parity stripes