Re: [PATCH md-6.12 v14 1/1] md: generate CHANGE uevents for md device

Kinga Stefaniuk <kinga.stefaniuk@xxxxxxxxxxxxxxx> · Thu, 5 Sep 2024 16:46:00 +0200

On Thu, 5 Sep 2024 15:42:00 +0800
Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

> Hi,
> 
> 在 2024/09/05 5:51, Song Liu 写道:
> > On Mon, Sep 2, 2024 at 1:38 AM Kinga Stefaniuk
> > <kinga.stefaniuk@xxxxxxxxx> wrote:  
> >>
> >> In mdadm commit 49b69533e8 ("mdmonitor: check if udev has finished
> >> events processing") mdmonitor has been learnt to wait for udev to
> >> finish processing, and later in commit 9935cf0f64f3 ("Mdmonitor:
> >> Improve udev event handling") pooling for MD events on
> >> /proc/mdstat file has been deprecated because relying on udev
> >> events is more reliable and less bug prone (we are not competing
> >> with udev).
> >>
> >> After those changes we are still observing missing mdmonitor
> >> events in some scenarios, especially SpareEvent is likely to be
> >> missed. With this patch MD will be able to generate more change
> >> uevents and wakeup mdmonitor more frequently to give it
> >> possibility to notice events. MD has md_new_events() functionality
> >> to trigger events and with this patch this function is extended to
> >> generate udev CHANGE uevents. It cannot be done directly because
> >> this function is called on interrupts context, so appropriate
> >> workqueue is created. Uevents are less time critical, it is safe
> >> to use workqueue. It is limited to CHANGE event as there is no
> >> need to generate other uevents for now. With this change,
> >> mdmonitor events are less likely to be missed. Our internal tests
> >> suite confirms that, mdmonitor reliability is (again) improved.
> >> Start using irq methods on all_mddevs_lock, because it can be
> >> reached by interrupt context.
> >>
> >> Signed-off-by: Mateusz Grzonka <mateusz.grzonka@xxxxxxxxx>
> >> Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@xxxxxxxxx>  
> > 
> > I am seeing new failures from mdadm tests, for example, test
> > 01replace. Please run these tests and fix the issues.  
> 
> I just test this myself in my VM, I didn't see 01replace failed,
> howerver, test 13imsm-r0_r5_3d-grow-r0_r5_4d start to hang:
> 
> [16098.862049] INFO: task systemd-udevd:57927 blocked for more than
> 368 seconds.^M
> [16098.863049]       Not tainted 6.11.0-rc1-00078-g761e5afb6ddb-dirty
> #362^M [16098.863802] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
> [16098.865773] ^M
> [16098.865773] Showing all locks held in the system:^M
> [16098.866702] 1 lock held by khungtaskd/31:^M
> [16098.867233]  #0: ffffffff8a789b40 (rcu_read_lock){....}-{1:2}, at: 
> debug_show_all_locks+0x46/0x320^M
> [16098.868589] 1 lock held by systemd-journal/203:^M
> [16098.869276] 1 lock held by systemd-udevd/57927:^M
> [16098.869966]  #0: ffff8881a61fa1a8 
> (mapping.invalidate_lock#2){++++}-{3:3}, at: 
> page_cache_ra_unbounded+0x73/0x2d0^M
> [16098.871477] 4 locks held by mdadm/58163:^M
> [16098.872099]  #0: ffff88817d4b4400 (sb_writers#5){.+.+}-{0:0}, at: 
> vfs_write+0x32d/0x470^M
> [16098.873303]  #1: ffff888193dcd688 (&of->mutex#2){+.+.}-{3:3}, at: 
> kernfs_fop_write_iter+0x143/0x280^M
> [16098.874620]  #2: ffff8881323cb010 (kn->active#98){.+.+}-{0:0}, at: 
> kernfs_fop_write_iter+0x153/0x280^M
> [16098.876005]  #3: ffff888193d4a0a8 
> (&mddev->suspend_mutex){+.+.}-{3:3}, at: mddev_suspend+0x59/0x380
> [md_mod]^M
> 
> [root@fedora ~]# cat /proc/57927/stack
> [<0>] wait_woken+0xa4/0xd0
> [<0>] raid5_make_request+0x994/0x2080 [raid456]
> [<0>] md_handle_request+0x17a/0x4b0 [md_mod]
> [<0>] md_submit_bio+0x7c/0x130 [md_mod]
> [<0>] __submit_bio+0x12b/0x190
> [<0>] submit_bio_noacct_nocheck+0x22b/0x6a0
> [<0>] submit_bio_noacct+0x259/0xac0
> [<0>] submit_bio+0x58/0x1d0
> [<0>] mpage_readahead+0x195/0x280
> [<0>] blkdev_readahead+0x1d/0x30
> [<0>] read_pages+0x6e/0x550
> [<0>] page_cache_ra_unbounded+0x1c6/0x2d0
> [<0>] do_page_cache_ra+0x4f/0x80
> [<0>] force_page_cache_ra+0x78/0xc0
> [<0>] page_cache_sync_ra+0x60/0x460
> [<0>] filemap_get_pages+0x13f/0xba0
> [<0>] filemap_read+0x122/0x590
> [<0>] blkdev_read_iter+0x7a/0x210
> [<0>] vfs_read+0x27f/0x400
> [<0>] ksys_read+0x85/0x180
> [<0>] __x64_sys_read+0x21/0x30
> [<0>] x64_sys_call+0x45e7/0x4600
> [<0>] do_syscall_64+0xd5/0x230
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Does user space need to change as well?
> 
> Thanks,
> Kuai
> 
> > 
> > Thanks,
> > Song
> > 
> > 
> > .
> >   
> 
> 

Hi

Thanks for your review. I rebased my patch to md-6.12 branch and met
the same symptoms as Kuai. I need to investigate it and will be back
with my findings or new patch version. Maybe there is a problem with
tests, because I can only reproduce it when I run all of the tests.
While running them one-by-one I don't see this problem.

Thanks,
Kinga