Re: [PATCH md-6.12 v14 1/1] md: generate CHANGE uevents for md device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/09/05 5:51, Song Liu 写道:
On Mon, Sep 2, 2024 at 1:38 AM Kinga Stefaniuk
<kinga.stefaniuk@xxxxxxxxx> wrote:

In mdadm commit 49b69533e8 ("mdmonitor: check if udev has finished
events processing") mdmonitor has been learnt to wait for udev to finish
processing, and later in commit 9935cf0f64f3 ("Mdmonitor: Improve udev
event handling") pooling for MD events on /proc/mdstat file has been
deprecated because relying on udev events is more reliable and less bug
prone (we are not competing with udev).

After those changes we are still observing missing mdmonitor events in
some scenarios, especially SpareEvent is likely to be missed. With this
patch MD will be able to generate more change uevents and wakeup
mdmonitor more frequently to give it possibility to notice events.
MD has md_new_events() functionality to trigger events and with this
patch this function is extended to generate udev CHANGE uevents. It
cannot be done directly because this function is called on interrupts
context, so appropriate workqueue is created. Uevents are less time
critical, it is safe to use workqueue. It is limited to CHANGE event as
there is no need to generate other uevents for now.
With this change, mdmonitor events are less likely to be missed. Our
internal tests suite confirms that, mdmonitor reliability is (again)
improved.
Start using irq methods on all_mddevs_lock, because it can be reached
by interrupt context.

Signed-off-by: Mateusz Grzonka <mateusz.grzonka@xxxxxxxxx>
Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@xxxxxxxxx>

I am seeing new failures from mdadm tests, for example, test 01replace.
Please run these tests and fix the issues.

I just test this myself in my VM, I didn't see 01replace failed,
howerver, test 13imsm-r0_r5_3d-grow-r0_r5_4d start to hang:

[16098.862049] INFO: task systemd-udevd:57927 blocked for more than 368 seconds.^M
[16098.863049]       Not tainted 6.11.0-rc1-00078-g761e5afb6ddb-dirty #362^M
[16098.863802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
[16098.865773] ^M
[16098.865773] Showing all locks held in the system:^M
[16098.866702] 1 lock held by khungtaskd/31:^M
[16098.867233] #0: ffffffff8a789b40 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x46/0x320^M
[16098.868589] 1 lock held by systemd-journal/203:^M
[16098.869276] 1 lock held by systemd-udevd/57927:^M
[16098.869966] #0: ffff8881a61fa1a8 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x73/0x2d0^M
[16098.871477] 4 locks held by mdadm/58163:^M
[16098.872099] #0: ffff88817d4b4400 (sb_writers#5){.+.+}-{0:0}, at: vfs_write+0x32d/0x470^M [16098.873303] #1: ffff888193dcd688 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x143/0x280^M [16098.874620] #2: ffff8881323cb010 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x153/0x280^M [16098.876005] #3: ffff888193d4a0a8 (&mddev->suspend_mutex){+.+.}-{3:3}, at: mddev_suspend+0x59/0x380 [md_mod]^M

[root@fedora ~]# cat /proc/57927/stack
[<0>] wait_woken+0xa4/0xd0
[<0>] raid5_make_request+0x994/0x2080 [raid456]
[<0>] md_handle_request+0x17a/0x4b0 [md_mod]
[<0>] md_submit_bio+0x7c/0x130 [md_mod]
[<0>] __submit_bio+0x12b/0x190
[<0>] submit_bio_noacct_nocheck+0x22b/0x6a0
[<0>] submit_bio_noacct+0x259/0xac0
[<0>] submit_bio+0x58/0x1d0
[<0>] mpage_readahead+0x195/0x280
[<0>] blkdev_readahead+0x1d/0x30
[<0>] read_pages+0x6e/0x550
[<0>] page_cache_ra_unbounded+0x1c6/0x2d0
[<0>] do_page_cache_ra+0x4f/0x80
[<0>] force_page_cache_ra+0x78/0xc0
[<0>] page_cache_sync_ra+0x60/0x460
[<0>] filemap_get_pages+0x13f/0xba0
[<0>] filemap_read+0x122/0x590
[<0>] blkdev_read_iter+0x7a/0x210
[<0>] vfs_read+0x27f/0x400
[<0>] ksys_read+0x85/0x180
[<0>] __x64_sys_read+0x21/0x30
[<0>] x64_sys_call+0x45e7/0x4600
[<0>] do_syscall_64+0xd5/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Does user space need to change as well?

Thanks,
Kuai


Thanks,
Song


.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux