Re: [PATCH] md: warn about using another MD array as write journal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 20, 2021, at 7:16 AM, Song Liu <song@xxxxxxxxxx> wrote:
> 
> Sorry for being late on this issue.
> 
> Manuel and Vojtech, are we confident that this issue only happens when we use
> another md array as the journal device?
> 
> Thanks,
> Song

Hi Song,

thanks for getting back.

Unfortunately it's still happening, even when using a NVMe partition directly. It just took a long 3 weeks to happen. So discard my patch. Here how it went down yesterday:

- process md4_raid6 is running with 100% CPU utilization, all I/O to the array is blocked
- no disk activity on the physical drives
- soft reboot doesn't work, as md4_raid6 blocks, so hard reset is needed
- when booting to rescue mode, it tries to assemble the array and shows the same issue of 100% CPU utilization. Also can't reboot.
- when manually assembling it *with* the journal drive, it will read a few GB from the journal device and then get stuck at 100% CPU utilization again without any disk activity.

Solution in the end was to avoid assembling the array on reboot, then assemble it *without* the existing journal and add an empty journal drive later. This lead to some data loss and a full resync.

I'm currently moving all data off this machine and will repave it. Then see if that changes anything.

My main OS is CentOS 8 and the rescue system was Debian. Both showed a similar issue. This must be connected to the journal drive somehow.

My journal drive is a partition on an NVMe with ~180GB in size.

Thanks for any pointers, I could try next.

Manu



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux