Re: A hang bug of dm on s390x

Zdenek Kabelac <zkabelac@xxxxxxxxxx> · Thu, 16 Feb 2023 13:39:52 +0100

Dne 16. 02. 23 v 9:30 Pingfan Liu napsal(a):
Hi Ming,

Thank you for looking into this.

let me loop in Alasdair, Mike and Zdenek for further comment on LVM stuff


Thanks,

Pingfan


Hi


From lvm2 POV - couple clarifications - to let thin-pool auto-extend - user 
has to configure:


thin_pool_autoextend_threshold = 70

thin_pool_autoextend_percent = 20


If the thin_pool_autoextend_threshold is left with the default value 100, 
there is no extension made to the thin-pool.

Default behavior of thin-pool kernel target when it runs out-of-space is to 
put all in-flight IO operation on-hold for 60s (configurable by kernel 
parameter) then all such operation starts to be errored and thin-pool goes to 
out-of-space error state.

To immediately get this state use  '--errorwhenfull=y'  with thinpool 
(lvcreate, lvconvert, lvchange) - this will avoid any delay if user doesn't 
want expansion of thin-pool and wants to get error ASAP.

But this all might be unrelated to the issue you are getting on your hw.


Regards

Zdenek


On Thu, Feb 16, 2023 at 8:08 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
On Wed, Feb 15, 2023 at 07:23:40PM +0800, Pingfan Liu wrote:
Hi guys,

I encountered  a hang issue on a s390x system.  The tested kernel is
not preemptible and booting with "nr_cpus=1"

The test steps:
       umount /home
       lvremove /dev/rhel_s390x-kvm-011/home
       ## uncomment "snapshot_autoextend_threshold = 70" and
       "snapshot_autoextend_percent = 20" in /etc/lvm/lvm.conf

       systemctl enable lvm2-monitor.service
       systemctl start lvm2-monitor.service

       lvremove -y rhel_s390x-kvm-011/thinp
       lvcreate -L 10M -T rhel_s390x-kvm-011/thinp
       lvcreate -V 400M -T rhel_s390x-kvm-011/thinp -n src
       mkfs.ext4 /dev/rhel_s390x-kvm-011/src
       mount /dev/rhel_s390x-kvm-011/src /mnt
       for((i=0;i<4;i++)); do dd if=/dev/zero of=/mnt/test$i.img
bs=100M count=1; done

And the system hangs with the console log [1]

The related kernel config

     CONFIG_PREEMPT_NONE_BUILD=y
     CONFIG_PREEMPT_NONE=y
     CONFIG_PREEMPT_COUNT=y
     CONFIG_SCHED_CORE=y

It turns out that when hanging, the kernel is stuck in the dead-loop
in the function dm_wq_work()
         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
                 spin_lock_irq(&md->deferred_lock);
                 bio = bio_list_pop(&md->deferred);
                 spin_unlock_irq(&md->deferred_lock);

                 if (!bio)
                         break;
                 thread_cpu = smp_processor_id();
                 submit_bio_noacct(bio);
         }
where dm_wq_work()->__submit_bio_noacct()->...->dm_handle_requeue()
keeps generating new bio, and the condition "if (!bio)" can not be
meet.


After applying the following patch, the issue is gone.

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e1ea3a7bd9d9..95c9cb07a42f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2567,6 +2567,7 @@ static void dm_wq_work(struct work_struct *work)
                         break;

                 submit_bio_noacct(bio);
+               cond_resched();
         }
  }

But I think it is not a proper solution. And without this patch, if
removing nr_cpus=1 (the system has two cpus), the issue can not be
triggered. That says when more than one cpu, the above loop can exit
by the condition "if (!bio)"

Any ideas?
I think the patch is correct.

For kernel built without CONFIG_PREEMPT, in case of single cpu core,
if the dm target(such as dm-thin) needs another wq or kthread for
handling IO, then dm target side is blocked because dm_wq_work()
holds the single cpu, sooner or later, dm target may have not
resource to handle new io from dm core and returns REQUEUE.

Then dm_wq_work becomes one dead loop.


Thanks,
Ming


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel