On Mon, Jul 8, 2024 at 8:34 AM Bharata B Rao <bharata@xxxxxxx> wrote: > > Hi Yu Zhao, > > Thanks for your patches. See below... > > On 07-Jul-24 4:12 AM, Yu Zhao wrote: > > Hi Bharata, > > > > On Wed, Jul 3, 2024 at 9:11 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >> > <snip> > >> > >> Some experiments tried > >> ====================== > >> 1) When MGLRU was enabled many soft lockups were observed, no hard > >> lockups were seen for 48 hours run. Below is once such soft lockup. > > > > This is not really an MGLRU issue -- can you please try one of the > > attached patches? It (truncate.patch) should help with or without > > MGLRU. > > With truncate.patch and default LRU scheme, a few hard lockups are seen. Thanks. In your original report, you said: Most of the times the two contended locks are lruvec and inode->i_lock spinlocks. ... Often times, the perf output at the time of the problem shows heavy contention on lruvec spin lock. Similar contention is also observed with inode i_lock (in clear_shadow_entry path) Based on this new report, does it mean the i_lock is not as contended, for the same path (truncation) you tested? If so, I'll post truncate.patch and add reported-by and tested-by you, unless you have objections. The two paths below were contended on the LRU lock, but they already batch their operations. So I don't know what else we can do surgically to improve them. > First one is this: > > watchdog: Watchdog detected hard LOCKUP on cpu 487 > CPU: 487 PID: 11525 Comm: fio Not tainted 6.10.0-rc3 #27 > RIP: 0010:native_queued_spin_lock_slowpath+0x81/0x300 > Call Trace: > <NMI> > ? show_regs+0x69/0x80 > ? watchdog_hardlockup_check+0x1b4/0x3a0 > <SNIP> > ? native_queued_spin_lock_slowpath+0x81/0x300 > </NMI> > <TASK> > ? __pfx_folio_activate_fn+0x10/0x10 > _raw_spin_lock_irqsave+0x5b/0x70 > folio_lruvec_lock_irqsave+0x62/0x90 > folio_batch_move_lru+0x9d/0x160 > folio_activate+0x95/0xe0 > folio_mark_accessed+0x11f/0x160 > filemap_read+0x343/0x3d0 > <SNIP> > blkdev_read_iter+0x6f/0x140 > vfs_read+0x25b/0x340 > ksys_read+0x67/0xf0 > __x64_sys_read+0x19/0x20 > x64_sys_call+0x1771/0x20d0 > > This is the next one: > > watchdog: Watchdog detected hard LOCKUP on cpu 219 > CPU: 219 PID: 2584763 Comm: fs_racer_dir_cr Not tainted 6.10.0-rc3 #27 > RIP: 0010:native_queued_spin_lock_slowpath+0x2b4/0x300 > Call Trace: > <NMI> > ? show_regs+0x69/0x80 > ? watchdog_hardlockup_check+0x1b4/0x3a0 > <SNIP> > ? native_queued_spin_lock_slowpath+0x2b4/0x300 > </NMI> > <TASK> > _raw_spin_lock_irqsave+0x5b/0x70 > folio_lruvec_lock_irqsave+0x62/0x90 > __page_cache_release+0x89/0x2f0 > folios_put_refs+0x92/0x230 > __folio_batch_release+0x74/0x90 > truncate_inode_pages_range+0x16f/0x520 > truncate_pagecache+0x49/0x70 > ext4_setattr+0x326/0xaa0 > notify_change+0x353/0x500 > do_truncate+0x83/0xe0 > path_openat+0xd9e/0x1090 > do_filp_open+0xaa/0x150 > do_sys_openat2+0x9b/0xd0 > __x64_sys_openat+0x55/0x90 > x64_sys_call+0xe55/0x20d0 > do_syscall_64+0x7e/0x130 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > When this happens, all-CPU backtrace shows a CPU being in > isolate_lru_folios(). > > > > >> kernel: watchdog: BUG: soft lockup - CPU#29 stuck for 11s! [fio:2701649] > >> kernel: CPU: 29 PID: 2701649 Comm: fio Tainted: G L > >> 6.10.0-rc3-mglru-irqstrc #24 > >> kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x2b4/0x300 > >> kernel: Call Trace: > >> kernel: <IRQ> > >> kernel: ? show_regs+0x69/0x80 > >> kernel: ? watchdog_timer_fn+0x223/0x2b0 > >> kernel: ? __pfx_watchdog_timer_fn+0x10/0x10 > >> <SNIP> > >> kernel: </IRQ> > >> kernel: <TASK> > >> kernel: ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 > >> kernel: ? native_queued_spin_lock_slowpath+0x2b4/0x300 > >> kernel: _raw_spin_lock+0x38/0x50 > >> kernel: clear_shadow_entry+0x3d/0x100 > >> kernel: ? __pfx_workingset_update_node+0x10/0x10 > >> kernel: mapping_try_invalidate+0x117/0x1d0 > >> kernel: invalidate_mapping_pages+0x10/0x20 > >> kernel: invalidate_bdev+0x3c/0x50 > >> kernel: blkdev_common_ioctl+0x5f7/0xa90 > >> kernel: blkdev_ioctl+0x109/0x270 > >> kernel: x64_sys_call+0x1215/0x20d0 > >> kernel: do_syscall_64+0x7e/0x130 > >> > >> This happens to be contending on inode i_lock spinlock. > >> > >> Below preemptirqsoff trace points to preemption being disabled for more > >> than 10s and the lock in picture is lruvec spinlock. > > > > Also if you could try the other patch (mglru.patch) please. It should > > help reduce unnecessary rotations from deactivate_file_folio(), which > > in turn should reduce the contention on the LRU lock for MGLRU. > > Currently testing is in progress with mglru.patch and MGLRU enabled. > Will get back on the results. Thank you.