(was regression: mm: vmscan: - size XL irqoff time increase v6.10+) While examining a service latency regression beginning at v6.10, it was discovered that isolate_lru_folios() runtimes could cause IRQ holdoff spanning several ticks. Worse measured was 78ms. Given it's called with IRQs disabled, cap its runtme to around a ms. With this patch, and one to mitigate move_expired_inodes() induced wakeup latency hits up 66ms, the load that repeatably produced this: (6.12.0-rc5 running osc kernel package build + ram sized bonnie) T: 0 ( 4541) P:99 I:1000 C: 773124 Min: 1 Act: 6 Avg: 3 Max: 53381 T: 1 ( 4542) P:99 I:1500 C: 515381 Min: 1 Act: 7 Avg: 3 Max: 60119 T: 2 ( 4543) P:99 I:2000 C: 386465 Min: 1 Act: 5 Avg: 4 Max: 67469 T: 3 ( 4544) P:99 I:2500 C: 309182 Min: 1 Act: 7 Avg: 4 Max: 65534 T: 4 ( 4545) P:99 I:3000 C: 257682 Min: 1 Act: 6 Avg: 4 Max: 57776 T: 5 ( 4546) P:99 I:3500 C: 220874 Min: 1 Act: 4 Avg: 4 Max: 64737 T: 6 ( 4547) P:99 I:4000 C: 193289 Min: 1 Act: 6 Avg: 4 Max: 52567 T: 7 ( 4548) P:99 I:4500 C: 171799 Min: 1 Act: 6 Avg: 5 Max: 52606 latency went to this: T: 0 ( 6078) P:99 I:1000 C:1031230 Min: 1 Act: 7 Avg: 4 Max: 4449 T: 1 ( 6079) P:99 I:1500 C: 687505 Min: 1 Act: 5 Avg: 4 Max: 2075 T: 2 ( 6080) P:99 I:2000 C: 515627 Min: 1 Act: 334 Avg: 5 Max: 2987 T: 3 ( 6081) P:99 I:2500 C: 412503 Min: 1 Act: 8 Avg: 5 Max: 2425 T: 4 ( 6082) P:99 I:3000 C: 343752 Min: 1 Act: 11 Avg: 5 Max: 1466 T: 5 ( 6083) P:99 I:3500 C: 294643 Min: 1 Act: 282 Avg: 5 Max: 3844 T: 6 ( 6084) P:99 I:4000 C: 257813 Min: 1 Act: 7 Avg: 5 Max: 3291 T: 7 ( 6085) P:99 I:4500 C: 229167 Min: 1 Act: 9 Avg: 5 Max: 3516 Signed-off-by: Mike Galbraith <efault@xxxxxx> --- mm/vmscan.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -63,6 +63,9 @@ #include <linux/swapops.h> #include <linux/balloon_compaction.h> #include <linux/sched/sysctl.h> +#ifndef CONFIG_PREEMPT_RT +#include <linux/sched/clock.h> +#endif #include "internal.h" #include "swap.h" @@ -1657,6 +1660,10 @@ static unsigned long isolate_lru_folios( unsigned long skipped = 0; unsigned long scan, total_scan, nr_pages; LIST_HEAD(folios_skipped); +#ifndef CONFIG_PREEMPT_RT + u64 then = local_clock(); + int batch = 0; +#endif total_scan = 0; scan = 0; @@ -1709,6 +1716,18 @@ static unsigned long isolate_lru_folios( move_to = dst; move: list_move(&folio->lru, move_to); + +#ifndef CONFIG_PREEMPT_RT + /* + * IRQs are disabled, cap holdoff at a millisecond or so. + */ + batch += nr_pages; + if (batch < 1024) + continue; + if (local_clock() - then >= NSEC_PER_MSEC) + break; + batch = 0; +#endif } /*
# tracer: irqsoff # # irqsoff latency trace v1.1.5 on 6.12.0.g4236f913-master # -------------------------------------------------------------------- # latency: 77766 us, #12/12, CPU#5 | (M:unknown VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: kswapd0-90 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: _raw_spin_lock_irq+0x16/0x40 # => ended at: _raw_spin_unlock_irq+0x10/0x30 # # # _------=> CPU# # / _-----=> irqs-off/BH-disabled # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / _-=> migrate-disable # ||||| / delay # cmd pid |||||| time | caller # \ / |||||| \ | / kswapd0-90 5d.... 0us : _raw_spin_lock_irq+0x16/0x40 kswapd0-90 5d.... 0us : preempt_count_add+0x0/0x50 <-_raw_spin_lock_irq+0x20/0x40 kswapd0-90 5d..1. 0us*: isolate_lru_folios+0x0/0x3e0 <-shrink_inactive_list+0x10f/0x440 kswapd0-90 5d..1. 77765us : __rcu_read_lock+0x0/0x20 <-isolate_lru_folios+0x160/0x3e0 kswapd0-90 5d..1. 77765us : __rcu_read_unlock+0x0/0x180 <-isolate_lru_folios+0x199/0x3e0 kswapd0-90 5d..1. 77766us : __mod_node_page_state+0x0/0x70 <-isolate_lru_folios+0x2ee/0x3e0 kswapd0-90 5d..1. 77766us : __mod_zone_page_state+0x0/0x60 <-isolate_lru_folios+0x2fc/0x3e0 kswapd0-90 5d..1. 77766us : __mod_node_page_state+0x0/0x70 <-shrink_inactive_list+0x12a/0x440 kswapd0-90 5d..1. 77766us : _raw_spin_unlock_irq+0x0/0x30 <-shrink_inactive_list+0x16a/0x440 kswapd0-90 5d..1. 77766us : _raw_spin_unlock_irq+0x10/0x30 kswapd0-90 5d..1. 77767us : tracer_hardirqs_on+0xcb/0xe0 <-_raw_spin_unlock_irq+0x10/0x30 kswapd0-90 5d..1. 77774us : <stack trace> => shrink_inactive_list+0x16a/0x440 => shrink_lruvec+0x2fc/0x470 => shrink_node+0x1d6/0x4a0 => balance_pgdat+0x280/0x5c0 => kswapd+0x10f/0x350 => kthread+0xfa/0x130 => ret_from_fork+0x2d/0x50 => ret_from_fork_asm+0x11/0x20