Re: [PATCH] md: warn about using another MD array as write journal

Song Liu <song@xxxxxxxxxx> · Mon, 22 Mar 2021 10:13:31 -0700

On Sat, Mar 20, 2021 at 9:22 PM Manuel Riel <manu@xxxxxxxxxxxxx> wrote:
>
> My impression is that the write-journal feature isn't fully stable yet, as was already reported in 2019[^1]. Vojtech and me are seeing the same errors as mentioned there.
>
> No matter if the journal is on a block device or another RAID.
>
> 1: https://www.spinics.net/lists/raid/msg62646.html
>
>
> > On Mar 20, 2021, at 9:12 AM, Manuel Riel <manu@xxxxxxxxxxxxx> wrote:
> >
> > On Mar 20, 2021, at 7:16 AM, Song Liu <song@xxxxxxxxxx> wrote:
> >>
> >> Sorry for being late on this issue.
> >>
> >> Manuel and Vojtech, are we confident that this issue only happens when we use
> >> another md array as the journal device?
> >>
> >> Thanks,
> >> Song
> >
> > Hi Song,
> >
> > thanks for getting back.
> >
> > Unfortunately it's still happening, even when using a NVMe partition directly. It just took a long 3 weeks to happen. So discard my patch. Here how it went down yesterday:
> >
> > - process md4_raid6 is running with 100% CPU utilization, all I/O to the array is blocked
> > - no disk activity on the physical drives
> > - soft reboot doesn't work, as md4_raid6 blocks, so hard reset is needed
> > - when booting to rescue mode, it tries to assemble the array and shows the same issue of 100% CPU utilization. Also can't reboot.
> > - when manually assembling it *with* the journal drive, it will read a few GB from the journal device and then get stuck at 100% CPU utilization again without any disk activity.
> >
> > Solution in the end was to avoid assembling the array on reboot, then assemble it *without* the existing journal and add an empty journal drive later. This lead to some data loss and a full resync.

Thanks for the information. Quick question, does the kernel have the
following change?
It fixes an issue at recovery time. Since you see the issue in normal
execution, it is probably
something different.

Thanks,
Song

commit c9020e64cf33f2dd5b2a7295f2bfea787279218a
Author: Song Liu <songliubraving@xxxxxx>
Date:   9 months ago

    md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes

    In recovery, if we process too much data, raid5-cache may set
    MD_SB_CHANGE_PENDING, which causes spinning in handle_stripe().
    Fix this issue by clearing the bit before flushing data only
    stripes. This issue was initially discussed in [1].

    [1] https://www.spinics.net/lists/raid/msg64409.html

    Signed-off-by: Song Liu <songliubraving@xxxxxx>