Re: Linux RAID with btrfs stuck and consume 100 % CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 29. 07. 20 23:48, Chris Murphy wrote:
> On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang
> <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:
>> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>>> 1. What should be the cause of this problem?
>>
>> Just a quick glance based on the stacks which you attached, I guess it
>> could be
>> a deadlock issue of raid5 cache super write.
>>
>> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely.  Cc Song.
> 
> That references discards, and it make me relook at mdadm -D which
> shows a journal device:
> 
>        0     253        2        -      journal   /dev/dm-2
> 
> Vojtech, can you confirm this device is an SSD? There are a couple
> SSDs that show up in the dmesg if I recall correctly.

I tried to explain this in my first post. It's logical volume in a
volume group over RAID 1 over 2 SSDs.

My colleague replied to with more details:

On 05. 08. 2020 Michal Moravec wrote:
>> On 29 Jul 2020, Chris Murphy wrote:
>> Vojtech, can you confirm this device is an SSD? There are a couple
>> SSDs that show up in the dmesg if I recall correctly.
>
> Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs
> (SSDSC2KG240G8).
> We use them for OS and the raid6 journal.
> They are configured as raid md0 array with LVM on top of it.
> Logical volume vg0-journal_md1 (of 1G size) is used as journal device
> for md1 array (where are problem with proccess md1_raid6 consuming
> 100%
> CPU and blocking btrfs operation is happening)


>> What is the default discard hinting for this SSD when it's used as
>> a journal device for mdadm?
>
> What do you mean by discard hinting?
> We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf


>> And what is the write behavior of the journal?
>
> That would be journal_mode set to write-through, right?


>> I'm not familiar with this feature at all, whether it's treated as a
>> raw block device for the journal or if the journal resides on a file
>> system.
>
> From lsblk output I see no filesystem on vg0-journal_md1. It looks
> like plain logical volume to me.

[my comment]: yes, it's LV block device, no filesystem here.


>> So I get kinda curious what might happen long term if this is a very
>> busy file system, very busy raid5/6 journal on this SSD, without any
>> discard hints?
>> Is it possible the SSD runs out of ready-to-write erase blocks, and
>> the firmware has become super slow doing erasure/garbage collection
>> on demand?
>> And the journal is now having a hard time flushing?
>
> What kind of information could we gather to verify/reject any of these
> ideas?


[my question]: Is LVM configuration (above) enough? Sadly, there are not
much information about RAID 6 journaling at kernel wiki. There are some
info in mdadm(8), but nothing about discards/trim operation.
NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdg                     8:96   1 223,6G  0 disk  
├─sdg1                  8:97   1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdg2                  8:98   1     1K  0 part  
└─sdg5                  8:101  1 186,3G  0 part  
sdh                     8:112  1 223,6G  0 disk  
├─sdh1                  8:113  1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdh2                  8:114  1     1K  0 part  
└─sdh5                  8:117  1 186,3G  0 part  

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux