Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@xxxxxxxxxx> wrote:
>
> This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
>
> The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> this problem.
>
> Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>

I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
variation of it. Otherwise, we may hit the following deadlock. The test vm here
has 2 raid arrays: one raid5 with journal, and a raid1.

I pushed other patches in the set to the md-6.9-for-hch branch for
further tests.

Thanks,
Song


[  250.347646] INFO: task systemd-udevd:546 blocked for more than 122 seconds.
[  250.348443]       Not tainted 6.8.0-rc3+ #479
[  250.348912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.349741] task:systemd-udevd   state:D stack:27136 pid:546
tgid:546   ppid:525    flags:0x00000000
[  250.350740] Call Trace:
[  250.351043]  <TASK>
[  250.351310]  __schedule+0x862/0x19b0
[  250.351770]  ? __pfx___schedule+0x10/0x10
[  250.352222]  ? lock_release+0x250/0x690
[  250.352657]  ? __pfx_lock_release+0x10/0x10
[  250.353128]  ? mark_held_locks+0x62/0x90
[  250.353604]  schedule+0x77/0x200
[  250.353976]  md_handle_request+0x1fe/0x650
[  250.354459]  ? __pfx_md_handle_request+0x10/0x10
[  250.354957]  ? bio_split_to_limits+0x131/0x150
[  250.355456]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.356031]  ? lock_is_held_type+0xda/0x130
[  250.356515]  __submit_bio+0x99/0xe0
[  250.356910]  submit_bio_noacct_nocheck+0x25a/0x570
[  250.357510]  ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[  250.358080]  ? __might_resched+0x274/0x350
[  250.358546]  ? submit_bio_noacct+0x1b7/0x6c0
[  250.359067]  mpage_readahead+0x25b/0x300
[  250.359507]  ? __pfx_mpage_readahead+0x10/0x10
[  250.359986]  ? __pfx___lock_acquire+0x10/0x10
[  250.360524]  ? __pfx_blkdev_get_block+0x10/0x10
[  250.361046]  ? __pfx_lock_release+0x10/0x10
[  250.361602]  ? __pfx___filemap_add_folio+0x10/0x10
[  250.362250]  ? lock_is_held_type+0xda/0x130
[  250.362785]  read_pages+0xfd/0x650
[  250.363173]  ? __pfx_read_pages+0x10/0x10
[  250.363685]  page_cache_ra_unbounded+0x1df/0x2d0
[  250.364228]  force_page_cache_ra+0x11e/0x150
[  250.364716]  filemap_get_pages+0x6f1/0xbb0
[  250.365218]  ? __pfx_filemap_get_pages+0x10/0x10
[  250.365735]  ? lock_is_held_type+0xda/0x130
[  250.366266]  filemap_read+0x216/0x6a0
[  250.366679]  ? __pfx_mark_lock+0x10/0x10
[  250.367132]  ? __pfx_ptep_set_access_flags+0x10/0x10
[  250.367765]  ? __pfx_filemap_read+0x10/0x10
[  250.368234]  ? __lock_acquire+0x959/0x3540
[  250.368756]  blkdev_read_iter+0xc0/0x230
[  250.369200]  vfs_read+0x38c/0x540
[  250.369581]  ? __pfx_vfs_read+0x10/0x10
[  250.370038]  ? __fget_light+0x96/0xd0
[  250.370469]  ksys_read+0xcb/0x170
[  250.370839]  ? __pfx_ksys_read+0x10/0x10
[  250.371320]  do_syscall_64+0x7a/0x1a0
[  250.371735]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.372367] RIP: 0033:0x7fcb590118b2
[  250.372865] RSP: 002b:00007ffcdd5f9c18 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[  250.373840] RAX: ffffffffffffffda RBX: 0000555885985010 RCX: 00007fcb590118b2
[  250.374641] RDX: 0000000000000040 RSI: 0000555885985038 RDI: 0000000000000011
[  250.375437] RBP: 000055588599fd40 R08: 0000555885985010 R09: 000055588596c010
[  250.376222] R10: 00007fcb58fbfbc0 R11: 0000000000000246 R12: 00000000804f0000
[  250.376974] R13: 0000000000000040 R14: 000055588599fd90 R15: 0000555885985028
[  250.377811]  </TASK>
[  250.378073] INFO: task mdadm:562 blocked for more than 122 seconds.
[  250.378753]       Not tainted 6.8.0-rc3+ #479
[  250.379237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.380055] task:mdadm           state:D stack:25872 pid:562
tgid:562   ppid:543    flags:0x00004000
[  250.381071] Call Trace:
[  250.381369]  <TASK>
[  250.381625]  __schedule+0x862/0x19b0
[  250.382054]  ? __pfx___schedule+0x10/0x10
[  250.382502]  ? lock_release+0x250/0x690
[  250.382943]  ? __pfx_lock_release+0x10/0x10
[  250.383407]  ? mark_held_locks+0x24/0x90
[  250.383851]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.384345]  ? preempt_count_sub+0x18/0xd0
[  250.384806]  ? _raw_spin_unlock_irqrestore+0x3f/0x60
[  250.385358]  schedule+0x77/0x200
[  250.385718]  md_ioctl+0x1750/0x1d60
[  250.386114]  ? __pfx_md_ioctl+0x10/0x10
[  250.386535]  ? _raw_spin_unlock_irqrestore+0x34/0x60
[  250.387063]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.387567]  ? preempt_count_sub+0x18/0xd0
[  250.388024]  ? populate_seccomp_data+0x184/0x220
[  250.388522]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.389083]  ? __seccomp_filter+0x102/0x760
[  250.389553]  blkdev_ioctl+0x1f1/0x3c0
[  250.389956]  ? __pfx_blkdev_ioctl+0x10/0x10
[  250.390441]  __x64_sys_ioctl+0xc6/0x100
[  250.390880]  do_syscall_64+0x7a/0x1a0
[  250.391313]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.391877] RIP: 0033:0x7fd88eef362b
[  250.392290] RSP: 002b:00007fff8c298438 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  250.393098] RAX: ffffffffffffffda RBX: 000055e1b77a2300 RCX: 00007fd88eef362b
[  250.393896] RDX: 00007fff8c2985a8 RSI: 0000000040140921 RDI: 0000000000000004
[  250.394664] RBP: 0000000000000005 R08: 000000000000001e R09: 00007fff8c298197
[  250.395457] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  250.396223] R13: 000055e1b77a4c70 R14: 00007fff8c2984f8 R15: 000055e1b77a46d0
[  250.397050]  </TASK>
[  250.397357]
[  250.397357] Showing all locks held in the system:
[  250.398092] 1 lock held by khungtaskd/211:
[  250.398535]  #0: ffffffff87f6fea0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x4d/0x230
[  250.399613] 1 lock held by systemd-journal/499:
[  250.400124] 1 lock held by systemd-udevd/546:
[  250.400616]  #0: ffff88801461d178
(mapping.invalidate_lock){.+.+}-{3:3}, at:
page_cache_ra_unbounded+0xa4/0x2d0
[  250.401701]
[  250.401882] =============================================
[  250.401882]
[  250.402618] Kernel panic - not syncing: hung_task: blocked tasks
[  250.403294] CPU: 2 PID: 211 Comm: khungtaskd Not tainted 6.8.0-rc3+ #479
[  250.404046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  250.405264] Call Trace:
[  250.405537]  <TASK>
[  250.405776]  dump_stack_lvl+0x4a/0x80
[  250.406185]  panic+0x41c/0x460
[  250.406592]  ? __pfx_panic+0x10/0x10
[  250.407167]  ? lock_release+0x205/0x690
[  250.407713]  ? preempt_count_sub+0x18/0xd0
[  250.408273]  watchdog+0x9af/0x9b0
[  250.408673]  ? __pfx_watchdog+0x10/0x10
[  250.409097]  kthread+0x1b1/0x1f0
[  250.409476]  ? kthread+0xf6/0x1f0
[  250.409849]  ? __pfx_kthread+0x10/0x10
[  250.410276]  ret_from_fork+0x31/0x60
[  250.410704]  ? __pfx_kthread+0x10/0x10
[  250.411123]  ret_from_fork_asm+0x1b/0x30
[  250.411604]  </TASK>
[  250.412330] Kernel Offset: disabled
[  250.412802] ---[ end Kernel panic - not syncing: hung_task: blocked
tasks ]---





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux