Re: [PATCH RFC V2 0/4] Fix regression bugs

Xiao Ni <xni@xxxxxxxxxx> · Fri, 23 Feb 2024 10:42:26 +0800

On Wed, Feb 21, 2024 at 1:45 PM Benjamin Marzinski <bmarzins@xxxxxxxxxx> wrote:
>
> On Tue, Feb 20, 2024 at 11:30:55PM +0800, Xiao Ni wrote:
> > Hi all
> >
> > Sorry, I know this patch set conflict with Yu Kuai's patch set. But
> > I have to send out this patch set. Now we're facing some deadlock
> > regression problems. So it's better to figure out the root cause and
> > fix them. But Kuai's patch set looks too complicate for me. And like
> > we're talking in the emails, Kuai's patch set breaks some rules. It's
> > not good to fix some problem by breaking the original logic. If we really
> > need to break some logic. It's better to use a distinct patch set to
> > describe why we need them.
> >
> > This patch is based on linus's tree. The tag is 6.8-rc5. If this patch set
> > can be accepted. We need to revert Kuai's patches which have been merged
> > in Song's tree (md-6.8-20240216 tag). This patch set has four patches.
> > The first two resolves deadlock problems. With these two patches, it can
> > resolve most deadlock problem. The third one fixes active_io counter bug.
> > The fouth one fixes the raid5 reshape deadlock problem.
>
> With this patchset on top of the v6.8-rc5 kernel I can still see a hang
> tearing down the devices at the end of lvconvert-raid-reshape.sh if I
> run it repeatedly. I haven't dug into this enough to be certain, but it
> appears that when this hangs, stripe_result make_stripe_request() is
> returning STRIPE_SCHEDULE_AND_RETRY because of
>
> ahead_of_reshape(mddev, logical_sector, conf->reshape_safe))
>
> this never runs stripe_across_reshape() from you last patch.
>
> It hangs with the following hung-task backtrace:
>
> [ 4569.331345] sysrq: Show Blocked State
> [ 4569.332640] task:mdX_resync      state:D stack:0     pid:155469 tgid:155469 ppid:2      flags:0x00004000
> [ 4569.335367] Call Trace:
> [ 4569.336122]  <TASK>
> [ 4569.336758]  __schedule+0x3ec/0x15c0
> [ 4569.337789]  ? __schedule+0x3f4/0x15c0
> [ 4569.338433]  ? __wake_up_klogd.part.0+0x3c/0x60
> [ 4569.339186]  schedule+0x32/0xd0
> [ 4569.339709]  md_do_sync+0xede/0x11c0
> [ 4569.340324]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4569.341183]  ? __pfx_md_thread+0x10/0x10
> [ 4569.341831]  md_thread+0xab/0x190
> [ 4569.342397]  kthread+0xe5/0x120
> [ 4569.342933]  ? __pfx_kthread+0x10/0x10
> [ 4569.343554]  ret_from_fork+0x31/0x50
> [ 4569.344152]  ? __pfx_kthread+0x10/0x10
> [ 4569.344761]  ret_from_fork_asm+0x1b/0x30
> [ 4569.345193]  </TASK>
> [ 4569.345403] task:dmsetup         state:D stack:0     pid:156091 tgid:156091 ppid:155933 flags:0x00004002
> [ 4569.346300] Call Trace:
> [ 4569.346538]  <TASK>
> [ 4569.346746]  __schedule+0x3ec/0x15c0
> [ 4569.347097]  ? __schedule+0x3f4/0x15c0
> [ 4569.347440]  ? sysvec_call_function_single+0xe/0x90
> [ 4569.347905]  ? asm_sysvec_call_function_single+0x1a/0x20
> [ 4569.348401]  ? __pfx_dev_remove+0x10/0x10
> [ 4569.348779]  schedule+0x32/0xd0
> [ 4569.349079]  stop_sync_thread+0x136/0x1d0
> [ 4569.349465]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4569.349965]  __md_stop_writes+0x15/0xe0
> [ 4569.350341]  md_stop_writes+0x29/0x40
> [ 4569.350698]  raid_postsuspend+0x53/0x60 [dm_raid]
> [ 4569.351159]  dm_table_postsuspend_targets+0x3d/0x60
> [ 4569.351627]  __dm_destroy+0x1c5/0x1e0
> [ 4569.351984]  dev_remove+0x11d/0x190
> [ 4569.352328]  ctl_ioctl+0x30e/0x5e0
> [ 4569.352659]  dm_ctl_ioctl+0xe/0x20
> [ 4569.352992]  __x64_sys_ioctl+0x94/0xd0
> [ 4569.353352]  do_syscall_64+0x86/0x170
> [ 4569.353703]  ? dm_ctl_ioctl+0xe/0x20
> [ 4569.354059]  ? syscall_exit_to_user_mode+0x89/0x230
> [ 4569.354517]  ? do_syscall_64+0x96/0x170
> [ 4569.354891]  ? exc_page_fault+0x7f/0x180
> [ 4569.355258]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 4569.355744] RIP: 0033:0x7f49e5dbc13d
> [ 4569.356113] RSP: 002b:00007ffc365585f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 4569.356804] RAX: ffffffffffffffda RBX: 000055638c4932c0 RCX: 00007f49e5dbc13d
> [ 4569.357488] RDX: 000055638c493af0 RSI: 00000000c138fd04 RDI: 0000000000000003
> [ 4569.358140] RBP: 00007ffc36558640 R08: 00007f49e5fbc690 R09: 00007ffc365584a8
> [ 4569.358783] R10: 00007f49e5fbb97d R11: 0000000000000246 R12: 00007f49e5fbb97d
> [ 4569.359442] R13: 000055638c493ba0 R14: 00007f49e5fbb97d R15: 00007f49e5fbb97d
> [ 4569.360090]  </TASK>

Hi Ben

I can reproduce this with 6.6 too. So it's not a regression by the
change (stop sync thread asynchronously). I'm trying to debug it and
find the root cause. In 6.8 with my patch set, the logs show it's
stuck at:
wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));

But raid5 conf->active_stripes is 0. So I'm still looking at why this
can happen.

Best Regards
Xiao
>
>
> >
> > I have run lvm2 regression test. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
> >
> > Xiao Ni (4):
> >   Clear MD_RECOVERY_WAIT when stopping dmraid
> >   Set MD_RECOVERY_FROZEN before stop sync thread
> >   md: Missing decrease active_io for flush io
> >   Don't check crossing reshape when reshape hasn't started
> >
> >  drivers/md/dm-raid.c |  2 ++
> >  drivers/md/md.c      |  8 +++++++-
> >  drivers/md/raid5.c   | 22 ++++++++++------------
> >  3 files changed, 19 insertions(+), 13 deletions(-)
> >
> > --
> > 2.32.0 (Apple Git-132)
>