A hang bug of dm on s390x

Pingfan Liu <piliu@xxxxxxxxxx> · Wed, 15 Feb 2023 19:23:40 +0800

Hi guys,

I encountered  a hang issue on a s390x system.  The tested kernel is
not preemptible and booting with "nr_cpus=1"

The test steps:
      umount /home
      lvremove /dev/rhel_s390x-kvm-011/home
      ## uncomment "snapshot_autoextend_threshold = 70" and
      "snapshot_autoextend_percent = 20" in /etc/lvm/lvm.conf

      systemctl enable lvm2-monitor.service
      systemctl start lvm2-monitor.service

      lvremove -y rhel_s390x-kvm-011/thinp
      lvcreate -L 10M -T rhel_s390x-kvm-011/thinp
      lvcreate -V 400M -T rhel_s390x-kvm-011/thinp -n src
      mkfs.ext4 /dev/rhel_s390x-kvm-011/src
      mount /dev/rhel_s390x-kvm-011/src /mnt
      for((i=0;i<4;i++)); do dd if=/dev/zero of=/mnt/test$i.img
bs=100M count=1; done

And the system hangs with the console log [1]

The related kernel config

    CONFIG_PREEMPT_NONE_BUILD=y
    CONFIG_PREEMPT_NONE=y
    CONFIG_PREEMPT_COUNT=y
    CONFIG_SCHED_CORE=y

It turns out that when hanging, the kernel is stuck in the dead-loop
in the function dm_wq_work()
        while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
                spin_lock_irq(&md->deferred_lock);
                bio = bio_list_pop(&md->deferred);
                spin_unlock_irq(&md->deferred_lock);

                if (!bio)
                        break;
                thread_cpu = smp_processor_id();
                submit_bio_noacct(bio);
        }
where dm_wq_work()->__submit_bio_noacct()->...->dm_handle_requeue()
keeps generating new bio, and the condition "if (!bio)" can not be
meet.


After applying the following patch, the issue is gone.

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e1ea3a7bd9d9..95c9cb07a42f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2567,6 +2567,7 @@ static void dm_wq_work(struct work_struct *work)
                        break;

                submit_bio_noacct(bio);
+               cond_resched();
        }
 }

But I think it is not a proper solution. And without this patch, if
removing nr_cpus=1 (the system has two cpus), the issue can not be
triggered. That says when more than one cpu, the above loop can exit
by the condition "if (!bio)"

Any ideas?

Thanks,

Pingfan

[1]: the console log when hanging

 [ 2062.321473] device-mapper: thin: 253:4: reached low water mark for
data device: sending event.
 [ 2062.353217] dm-3: detected capacity change from 122880 to 147456
 [-- MARK -- Wed Dec 14 08:45:00 2022]
[ 2062.376690] device-mapper: thin: 253:4: switching pool to
out-of-data-space (queue IO) mode
 [ 2122.393998] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
 [ 2122.394011] (detected by 0, t=6002 jiffies, g=36205, q=624 ncpus=1)
 [ 2122.394014] rcu: All QSes seen, last rcu_sched kthread activity
6002 (4295149593-4295143591), jiffies_till_next_fqs=1, root ->qsmask
0x0
 [ 2122.394017] rcu: rcu_sched kthread starved for 6002 jiffies!
g36205 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
 [ 2122.394019] rcu: Unless rcu_sched kthread gets sufficient CPU
time, OOM is now expected behavior.
 [ 2122.394020] rcu: RCU grace-period kthread stack dump:
 [ 2122.394022] task:rcu_sched       state:R  running task     stack:
  0 pid:   15 ppid:     2 flags:0x00000000
 [ 2122.394027] Call Trace:
 [ 2122.394030]  [<00000001001fd5b0>] __schedule+0x300/0x6c0
 [ 2122.394040]  [<00000001001fd9d2>] schedule+0x62/0xf0
 [ 2122.394042]  [<000000010020360c>] schedule_timeout+0x8c/0x170
 [ 2122.394045]  [<00000000ff9e219e>] rcu_gp_fqs_loop+0x30e/0x3d0
 [ 2122.394053]  [<00000000ff9e3ee2>] rcu_gp_kthread+0x132/0x1b0
 [ 2122.394054]  [<00000000ff980578>] kthread+0x108/0x110
 [ 2122.394058]  [<00000000ff9035cc>] __ret_from_fork+0x3c/0x60
 [ 2122.394061]  [<000000010020455a>] ret_from_fork+0xa/0x30
 [ 2122.394064] rcu: Stack dump where RCU GP kthread last ran:
 [ 2122.394064] Task dump for CPU 0:
 [ 2122.394066] task:kworker/0:2     state:R  running task     stack:
  0 pid:16943 ppid:     2 flags:0x00000004
 [ 2122.394070] Workqueue: kdmflush/253:6 dm_wq_work [dm_mod]
 [ 2122.394100] Call Trace:
 [ 2122.394100]  [<00000000ff98bb98>] sched_show_task.part.0+0xf8/0x130
 [ 2122.394106]  [<00000001001efc9a>]
rcu_check_gp_kthread_starvation+0x172/0x190
 [ 2122.394111]  [<00000000ff9e5afe>] print_other_cpu_stall+0x2de/0x370
 [ 2122.394113]  [<00000000ff9e5d60>] check_cpu_stall+0x1d0/0x270
 [ 2122.394114]  [<00000000ff9e6152>] rcu_sched_clock_irq+0x82/0x230
 [ 2122.394117]  [<00000000ff9f808a>] update_process_times+0xba/0xf0
 [ 2122.394121]  [<00000000ffa0adfa>] tick_sched_handle+0x4a/0x70
 [ 2122.394124]  [<00000000ffa0b2fe>] tick_sched_timer+0x5e/0xc0
 [ 2122.394126]  [<00000000ff9f8cb6>] __hrtimer_run_queues+0x136/0x290
 [ 2122.394128]  [<00000000ff9f9f80>] hrtimer_interrupt+0x150/0x2d0
 [ 2122.394130]  [<00000000ff90ce36>] do_IRQ+0x56/0x70
 [ 2122.394133]  [<00000000ff90d216>] do_irq_async+0x56/0xb0
 [ 2122.394135]  [<00000001001f6786>] do_ext_irq+0x96/0x160
 [ 2122.394138]  [<00000001002047bc>] ext_int_handler+0xdc/0x110
 [ 2122.394140]  [<000003ff8005cf48>]
dm_split_and_process_bio+0x28/0x4d0 [dm_mod]
 [ 2122.394152] ([<000003ff8005d2ec>]
dm_split_and_process_bio+0x3cc/0x4d0 [dm_mod])
 [ 2122.394162]  [<000003ff8005dca8>] dm_submit_bio+0x68/0x110 [dm_mod]
 [ 2122.394173]  [<00000000ffda0698>] __submit_bio+0x78/0x190
 [ 2122.394178]  [<00000000ffda081c>] __submit_bio_noacct+0x6c/0x1e0
 [ 2122.394180]  [<000003ff8005c2ac>] dm_wq_work+0x5c/0xc0 [dm_mod]
 [ 2122.394190]  [<00000000ff9771e6>] process_one_work+0x216/0x4a0
 [ 2122.394196]  [<00000000ff9779a4>] worker_thread+0x64/0x4a0
 [ 2122.394198]  [<00000000ff980578>] kthread+0x108/0x110
 [ 2122.394199]  [<00000000ff9035cc>] __ret_from_fork+0x3c/0x60
 [ 2122.394201]  [<000000010020455a>] ret_from_fork+0xa/0x30
 [ 2302.444001] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
 [ 2302.444015] (detected by 0, t=24007 jiffies, g=36205, q=624 ncpus=1)
 [ 2302.444018] rcu: All QSes seen, last rcu_sched kthread activity
24007 (4295167598-4295143591), jiffies_till_next_fqs=1, root ->qsmask
0x0
 [ 2302.444021] rcu: rcu_sched kthread starved for 24007 jiffies!
g36205 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
 [ 2302.444024] rcu: Unless rcu_sched kthread gets sufficient CPU
time, OOM is now expected behavior.
 [ 2302.444025] rcu: RCU grace-period kthread stack dump:
 [ 2302.444027] task:rcu_sched       state:R  running task     stack:
  0 pid:   15 ppid:     2 flag[-- MARK -- Wed Dec 14 08:50:00 2022]
s:0x00000000
 [ 2302.444204] Call Trace:
 [ 2302.444207]  [<00000001001fd5b0>] __schedule+0x300/0x6c0
 [ 2302.444216]  [<00000001001fd9d2>] schedule+0x62/0xf0
 [ 2302.444218]  [<000000010020360c>] schedule_timeout+0x8c/0x170
 [ 2302.444221]  [<00000000ff9e219e>] rcu_gp_fqs_loop+0x30e/0x3d0
 [ 2302.444227]  [<00000000ff9e3ee2>] rcu_gp_kthread+0x132/0x1b0
 [ 2302.444229]  [<00000000ff980578>] kthread+0x108/0x110
 [ 2302.444232]  [<00000000ff9035cc>] __ret_from_fork+0x3c/0x60
 [ 2302.444235]  [<000000010020455a>] ret_from_fork+0xa/0x30
 [ 2302.444237] rcu: Stack dump where RCU GP kthread last ran:
 [ 2302.444238] Task dump for CPU 0:
 [ 2302.444239] task:kworker/0:2     state:R  running task     stack:
  0 pid:16943 ppid:     2 flags:0x00000004
 [ 2302.444244] Workqueue: kdmflush/253:6 dm_wq_work [dm_mod]
 [ 2302.444270] Call Trace:
 [ 2302.444270]  [<00000000ff98bb98>] sched_show_task.part.0+0xf8/0x130
 [ 2302.444275]  [<00000001001efc9a>]
rcu_check_gp_kthread_starvation+0x172/0x190
 [ 2302.444280]  [<00000000ff9e5afe>] print_other_cpu_stall+0x2de/0x370
 [ 2302.444282]  [<00000000ff9e5d60>] check_cpu_stall+0x1d0/0x270
 [ 2302.444284]  [<00000000ff9e6152>] rcu_sched_clock_irq+0x82/0x230
 [ 2302.444286]  [<00000000ff9f808a>] update_process_times+0xba/0xf0
 [ 2302.444290]  [<00000000ffa0adfa>] tick_sched_handle+0x4a/0x70
 [ 2302.444292]  [<00000000ffa0b2fe>] tick_sched_timer+0x5e/0xc0
 [ 2302.444294]  [<00000000ff9f8cb6>] __hrtimer_run_queues+0x136/0x290
 [ 2302.444296]  [<00000000ff9f9f80>] hrtimer_interrupt+0x150/0x2d0
 [ 2302.444298]  [<00000000ff90ce36>] do_IRQ+0x56/0x70
 [ 2302.444301]  [<00000000ff90d216>] do_irq_async+0x56/0xb0
 [ 2302.444303]  [<00000001001f6786>] do_ext_irq+0x96/0x160
 [ 2302.444306]  [<00000001002047bc>] ext_int_handler+0xdc/0x110
 [ 2302.444308]  [<00000000ffbe2a1e>] kmem_cache_alloc+0x15e/0x530
 [ 2302.444316]  [<00000000ffb3e780>] mempool_alloc+0x60/0x210
 [ 2302.444319]  [<00000000ffd9ac4e>] bio_alloc_bioset+0x1ae/0x410
 [ 2302.444324]  [<00000000ffd9af8c>] bio_alloc_clone+0x3c/0x90
 [ 2302.444326]  [<000003ff8005cf9a>]
dm_split_and_process_bio+0x7a/0x4d0 [dm_mod]
 [ 2302.444337]  [<000003ff8005dca8>] dm_submit_bio+0x68/0x110 [dm_mod]
 [ 2302.444347]  [<00000000ffda0698>] __submit_bio+0x78/0x190
 [ 2302.444350]  [<00000000ffda081c>] __submit_bio_noacct+0x6c/0x1e0
 [ 2302.444353]  [<000003ff8005c2ac>] dm_wq_work+0x5c/0xc0 [dm_mod]
 [ 2302.444363]  [<00000000ff9771e6>] process_one_work+0x216/0x4a0
 [ 2302.444367]  [<00000000ff9779a4>] worker_thread+0x64/0x4a0
 [ 2302.444369]  [<00000000ff980578>] kthread+0x108/0x110
 [ 2302.444371]  [<00000000ff9035cc>] __ret_from_fork+0x3c/0x60
 [ 2302.444373]  [<000000010020455a>] ret_from_fork+0xa/0x30

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel