On Sat, Mar 20, 2021 at 9:22 PM Manuel Riel <manu@xxxxxxxxxxxxx> wrote: > > My impression is that the write-journal feature isn't fully stable yet, as was already reported in 2019[^1]. Vojtech and me are seeing the same errors as mentioned there. > > No matter if the journal is on a block device or another RAID. > > 1: https://www.spinics.net/lists/raid/msg62646.html > > > > On Mar 20, 2021, at 9:12 AM, Manuel Riel <manu@xxxxxxxxxxxxx> wrote: > > > > On Mar 20, 2021, at 7:16 AM, Song Liu <song@xxxxxxxxxx> wrote: > >> > >> Sorry for being late on this issue. > >> > >> Manuel and Vojtech, are we confident that this issue only happens when we use > >> another md array as the journal device? > >> > >> Thanks, > >> Song > > > > Hi Song, > > > > thanks for getting back. > > > > Unfortunately it's still happening, even when using a NVMe partition directly. It just took a long 3 weeks to happen. So discard my patch. Here how it went down yesterday: > > > > - process md4_raid6 is running with 100% CPU utilization, all I/O to the array is blocked > > - no disk activity on the physical drives > > - soft reboot doesn't work, as md4_raid6 blocks, so hard reset is needed > > - when booting to rescue mode, it tries to assemble the array and shows the same issue of 100% CPU utilization. Also can't reboot. > > - when manually assembling it *with* the journal drive, it will read a few GB from the journal device and then get stuck at 100% CPU utilization again without any disk activity. > > > > Solution in the end was to avoid assembling the array on reboot, then assemble it *without* the existing journal and add an empty journal drive later. This lead to some data loss and a full resync. Thanks for the information. Quick question, does the kernel have the following change? It fixes an issue at recovery time. Since you see the issue in normal execution, it is probably something different. Thanks, Song commit c9020e64cf33f2dd5b2a7295f2bfea787279218a Author: Song Liu <songliubraving@xxxxxx> Date: 9 months ago md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes In recovery, if we process too much data, raid5-cache may set MD_SB_CHANGE_PENDING, which causes spinning in handle_stripe(). Fix this issue by clearing the bit before flushing data only stripes. This issue was initially discussed in [1]. [1] https://www.spinics.net/lists/raid/msg64409.html Signed-off-by: Song Liu <songliubraving@xxxxxx>