On Tue, Oct 10 2017, Xiao Ni wrote: > On 10/09/2017 01:52 PM, NeilBrown wrote: >> On Mon, Oct 09 2017, Xiao Ni wrote: >> >>> On 10/09/2017 12:57 PM, NeilBrown wrote: >>>> It would if you had applied >>>> [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() >>>> >>>> Did you apply all 4 patches? >>> Sorry, it's my mistake. I insmod the wrong module. I'll apply the four >>> patches >>> and do test again. >>>> Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly >>>> as you say - so a patch is missing. >>> Yes, thanks for pointing about this. > > Hi Neil > > I applied the four patches and one patch "md: fix deadlock error in > recent patch." > There is a new stuck. It's stuck at suspend_hi_store this time. I add > the calltrace > as an attachment. > > I added some printk to print some information. > > [12695.993329] mddev suspend : 1 > [12695.996270] mddev ro : 0 > [12695.998790] mddev insync : 0 > [12696.001641] mddev active io: 1 You didn't tell me where (in the code) you printed this information. That makes it hard to interpret. If mddev->active_io is 1, then some thread must be in this range of code atomic_inc(&mddev->active_io); rcu_read_unlock(); if (!mddev->pers->make_request(mddev, bio)) { atomic_dec(&mddev->active_io); wake_up(&mddev->sb_wait); goto check_suspended; } if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended) wake_up(&mddev->sb_wait); If that thread is blocked (which appears to be the case) it must be in ->make_request() because nothing else there blocks. None of the threads you showed are in that code. But you didn't report all the threads - only those which hard printed warnings. echo t > /proc/sysrq-trigger will produce the stack traces of *all* threads. That would be more useful. > > Can it be: > diff --git a/drivers/md/md.c b/drivers/md/md.c > index b6b7a28..55e9280 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev) > if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) > return; > if ( ! ( > - (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) || > + (mddev->flags & (mddev->external == 1 && ~ > (1<<MD_CHANGE_PENDING))) || Please read that code again and see how it doesn't make any sense at all. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature