> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >> 1. What should be the cause of this problem? > > Just a quick glance based on the stacks which you attached, I guess it could be > a deadlock issue of raid5 cache super write. > > Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock > write") didn't fix the problem completely. Cc Song. > > And I am curious why md thread is not waked if mddev_trylock fails, you can > give it a try but I can't promise it helps ... > > --- a/drivers/md/raid5-cache.c > +++ b/drivers/md/raid5-cache.c > @@ -1337,8 +1337,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log, > */ > set_mask_bits(&mddev->sb_flags, 0, > BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); > - if (!mddev_trylock(mddev)) > + if (!mddev_trylock(mddev)) { > + md_wakeup_thread(mddev->thread); > return; > + } > md_update_sb(mddev, 1); > mddev_unlock(mddev); > Thanks Guoqing! I am not sure whether we hit the mddev_trylock() failure. Looks like the md1_raid6 thread is already running at 100%. A few questions: 1. I see wbt_wait in the stack trace. Are we using write back throttling here? 2. Could you please get the /proc/<pid>/stack for <pid> of md1_raid6? We may want to sample it multiple times. Thanks, Song