Re: Linux RAID with btrfs stuck and consume 100 % CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>> 1. What should be the cause of this problem?
> 
> Just a quick glance based on the stacks which you attached, I guess it could be
> a deadlock issue of raid5 cache super write.
> 
> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock
> write") didn't fix the problem completely.  Cc Song.
> 
> And I am curious why md thread is not waked if mddev_trylock fails, you can
> give it a try but I can't promise it helps ...
> 
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1337,8 +1337,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log,
>          */
>         set_mask_bits(&mddev->sb_flags, 0,
>                 BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
> -       if (!mddev_trylock(mddev))
> +       if (!mddev_trylock(mddev)) {
> +               md_wakeup_thread(mddev->thread);
>                 return;
> +       }
>         md_update_sb(mddev, 1);
>         mddev_unlock(mddev);
> 

Thanks Guoqing!

I am not sure whether we hit the mddev_trylock() failure. Looks like the 
md1_raid6 thread is already running at 100%. 

A few questions: 

1. I see wbt_wait in the stack trace. Are we using write back throttling here?
2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6? We may
   want to sample it multiple times. 

Thanks,
Song









[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux