Re: A hang bug of dm on s390x

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 16 Feb 2023 12:29:33 -0500



[Top-posting but please don't...]

I've staged this fix for 6.3 inclusion and marked it for stable@:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-6.3&id=0ca44fcef241768fd25ee763b3d203b9852f269b

Ming, I also staged this similar fix (not reasoned through scenario
where it'd actually occur that dm_wq_requeue_work would loop endlessly
but its good practice to include cond_resched() in such a workqueue
while loop):
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-6.3&id=f77692d65d54665d81815349cc727baa85e8b71d

Thanks,
Mike

On Thu, Feb 16 2023 at  3:30P -0500,
Pingfan Liu <piliu@xxxxxxxxxx> wrote:

> Hi Ming,
> 
> Thank you for looking into this.
> 
> let me loop in Alasdair, Mike and Zdenek for further comment on LVM stuff
> 
> 
> Thanks,
> 
> Pingfan
> 
> On Thu, Feb 16, 2023 at 8:08 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> >
> > On Wed, Feb 15, 2023 at 07:23:40PM +0800, Pingfan Liu wrote:
> > > Hi guys,
> > >
> > > I encountered  a hang issue on a s390x system.  The tested kernel is
> > > not preemptible and booting with "nr_cpus=1"
> > >
> > > The test steps:
> > >       umount /home
> > >       lvremove /dev/rhel_s390x-kvm-011/home
> > >       ## uncomment "snapshot_autoextend_threshold = 70" and
> > >       "snapshot_autoextend_percent = 20" in /etc/lvm/lvm.conf
> > >
> > >       systemctl enable lvm2-monitor.service
> > >       systemctl start lvm2-monitor.service
> > >
> > >       lvremove -y rhel_s390x-kvm-011/thinp
> > >       lvcreate -L 10M -T rhel_s390x-kvm-011/thinp
> > >       lvcreate -V 400M -T rhel_s390x-kvm-011/thinp -n src
> > >       mkfs.ext4 /dev/rhel_s390x-kvm-011/src
> > >       mount /dev/rhel_s390x-kvm-011/src /mnt
> > >       for((i=0;i<4;i++)); do dd if=/dev/zero of=/mnt/test$i.img
> > > bs=100M count=1; done
> > >
> > > And the system hangs with the console log [1]
> > >
> > > The related kernel config
> > >
> > >     CONFIG_PREEMPT_NONE_BUILD=y
> > >     CONFIG_PREEMPT_NONE=y
> > >     CONFIG_PREEMPT_COUNT=y
> > >     CONFIG_SCHED_CORE=y
> > >
> > > It turns out that when hanging, the kernel is stuck in the dead-loop
> > > in the function dm_wq_work()
> > >         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> > >                 spin_lock_irq(&md->deferred_lock);
> > >                 bio = bio_list_pop(&md->deferred);
> > >                 spin_unlock_irq(&md->deferred_lock);
> > >
> > >                 if (!bio)
> > >                         break;
> > >                 thread_cpu = smp_processor_id();
> > >                 submit_bio_noacct(bio);
> > >         }
> > > where dm_wq_work()->__submit_bio_noacct()->...->dm_handle_requeue()
> > > keeps generating new bio, and the condition "if (!bio)" can not be
> > > meet.
> > >
> > >
> > > After applying the following patch, the issue is gone.
> > >
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index e1ea3a7bd9d9..95c9cb07a42f 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -2567,6 +2567,7 @@ static void dm_wq_work(struct work_struct *work)
> > >                         break;
> > >
> > >                 submit_bio_noacct(bio);
> > > +               cond_resched();
> > >         }
> > >  }
> > >
> > > But I think it is not a proper solution. And without this patch, if
> > > removing nr_cpus=1 (the system has two cpus), the issue can not be
> > > triggered. That says when more than one cpu, the above loop can exit
> > > by the condition "if (!bio)"
> > >
> > > Any ideas?
> >
> > I think the patch is correct.
> >
> > For kernel built without CONFIG_PREEMPT, in case of single cpu core,
> > if the dm target(such as dm-thin) needs another wq or kthread for
> > handling IO, then dm target side is blocked because dm_wq_work()
> > holds the single cpu, sooner or later, dm target may have not
> > resource to handle new io from dm core and returns REQUEUE.
> >
> > Then dm_wq_work becomes one dead loop.
> >
> >
> > Thanks,
> > Ming
> >
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel