Re: Linux RAID with btrfs stuck and consume 100 % CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

it seems my last e-mail was filtered as I can't find it in the archives.
So I will resend it and include all attachments in one tarball.


On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments..
>
> cat /proc/<pid>/stack for md1_raid6
>
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
> [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
> [<0>] md_thread+0x94/0x150 [md_mod]
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x22/0x40
>
>
> Btrfs snapshot flushing might instigate the problem but it seems to me
> there's some kind of contention or blocking happening within md, and
> that's why everything stalls. But I can't tell why.
>
> Do you have any iostat output at the time of this problem? I'm
> wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
> share a few minutes before and after the freeze/hang.
We have detected the issue at Monday 31.09.2020 15:24. It must happen
sometimes between 15:22-15:24 as we monitor the state every 2 minutes.

We have recorded stacks of blocked processes, sysrq+w command and
requested `iostat`. Then in 15:45, we perform manual "unstuck" process
by accessing md1 device via dd command (reading a few random blocks).

I hope attached file names are self-explaining.

Please let me know if we can do anything more to track the issue or if I
forget something.

Thanks a lot,
Vojtech and Michal



Description of the devices in iostat, just for recap:
- sda-sdf: 6 HDD disks
- sdg, sdh: 2 SSD disks

- md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
- md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
       formatted)

- Logical volumes over md0 Physical Volume (on SSD RAID)
    - dm-0: 4G  LV for SWAP
    - dm-1: 16G LV for root file system (ext4 formatted)
    - dm-2: 1G  LV for md1 journal

Attachment: mdraid-btrfs-issue.tgz
Description: application/compressed-tar


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux