Re: A hang bug of dm on s390x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ming,

Thank you for looking into this.

let me loop in Alasdair, Mike and Zdenek for further comment on LVM stuff


Thanks,

Pingfan

On Thu, Feb 16, 2023 at 8:08 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Wed, Feb 15, 2023 at 07:23:40PM +0800, Pingfan Liu wrote:
> > Hi guys,
> >
> > I encountered  a hang issue on a s390x system.  The tested kernel is
> > not preemptible and booting with "nr_cpus=1"
> >
> > The test steps:
> >       umount /home
> >       lvremove /dev/rhel_s390x-kvm-011/home
> >       ## uncomment "snapshot_autoextend_threshold = 70" and
> >       "snapshot_autoextend_percent = 20" in /etc/lvm/lvm.conf
> >
> >       systemctl enable lvm2-monitor.service
> >       systemctl start lvm2-monitor.service
> >
> >       lvremove -y rhel_s390x-kvm-011/thinp
> >       lvcreate -L 10M -T rhel_s390x-kvm-011/thinp
> >       lvcreate -V 400M -T rhel_s390x-kvm-011/thinp -n src
> >       mkfs.ext4 /dev/rhel_s390x-kvm-011/src
> >       mount /dev/rhel_s390x-kvm-011/src /mnt
> >       for((i=0;i<4;i++)); do dd if=/dev/zero of=/mnt/test$i.img
> > bs=100M count=1; done
> >
> > And the system hangs with the console log [1]
> >
> > The related kernel config
> >
> >     CONFIG_PREEMPT_NONE_BUILD=y
> >     CONFIG_PREEMPT_NONE=y
> >     CONFIG_PREEMPT_COUNT=y
> >     CONFIG_SCHED_CORE=y
> >
> > It turns out that when hanging, the kernel is stuck in the dead-loop
> > in the function dm_wq_work()
> >         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> >                 spin_lock_irq(&md->deferred_lock);
> >                 bio = bio_list_pop(&md->deferred);
> >                 spin_unlock_irq(&md->deferred_lock);
> >
> >                 if (!bio)
> >                         break;
> >                 thread_cpu = smp_processor_id();
> >                 submit_bio_noacct(bio);
> >         }
> > where dm_wq_work()->__submit_bio_noacct()->...->dm_handle_requeue()
> > keeps generating new bio, and the condition "if (!bio)" can not be
> > meet.
> >
> >
> > After applying the following patch, the issue is gone.
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index e1ea3a7bd9d9..95c9cb07a42f 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -2567,6 +2567,7 @@ static void dm_wq_work(struct work_struct *work)
> >                         break;
> >
> >                 submit_bio_noacct(bio);
> > +               cond_resched();
> >         }
> >  }
> >
> > But I think it is not a proper solution. And without this patch, if
> > removing nr_cpus=1 (the system has two cpus), the issue can not be
> > triggered. That says when more than one cpu, the above loop can exit
> > by the condition "if (!bio)"
> >
> > Any ideas?
>
> I think the patch is correct.
>
> For kernel built without CONFIG_PREEMPT, in case of single cpu core,
> if the dm target(such as dm-thin) needs another wq or kthread for
> handling IO, then dm target side is blocked because dm_wq_work()
> holds the single cpu, sooner or later, dm target may have not
> resource to handle new io from dm core and returns REQUEUE.
>
> Then dm_wq_work becomes one dead loop.
>
>
> Thanks,
> Ming
>

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux