Hello, I would like to report to you an unpleasant behavior of multi-gen LRU with strange swap in/out usage on my Dell 7525 two socket AMD 74F3 system (16numa domains). Symptoms of my issue are /A/ if mult-gen LRU is enabled 1/ [kswapd3] is consuming 100% CPU top - 15:03:11 up 34 days, 1:51, 2 users, load average: 23.34, 18.26, 15.01 Tasks: 1226 total, 2 running, 1224 sleeping, 0 stopped, 0 zombie %Cpu(s): 12.5 us, 4.7 sy, 0.0 ni, 82.1 id, 0.0 wa, 0.4 hi, 0.4 si, 0.0 st MiB Mem : 1047265.+total, 28382.7 free, 1021308.+used, 767.6 buff/cache MiB Swap: 8192.0 total, 8187.7 free, 4.2 used. 25956.7 avail Mem ... 765 root 20 0 0 0 0 R 98.3 0.0 34969:04 kswapd3 ... 2/ swap space usage is low about ~4MB from 8GB as swap in zram (was observed with swap disk as well and cause IO latency issues due to some kind of locking) 3/ swap In/Out is huge and symmetrical ~12MB/s in and ~12MB/s out /B/ if mult-gen LRU is disabled 1/ [kswapd3] is consuming 3%-10% CPU top - 15:02:49 up 34 days, 1:51, 2 users, load average: 23.05, 17.77, 14.77 Tasks: 1226 total, 1 running, 1225 sleeping, 0 stopped, 0 zombie %Cpu(s): 14.7 us, 2.8 sy, 0.0 ni, 81.8 id, 0.0 wa, 0.4 hi, 0.4 si, 0.0 st MiB Mem : 1047265.+total, 28378.5 free, 1021313.+used, 767.3 buff/cache MiB Swap: 8192.0 total, 8189.0 free, 3.0 used. 25952.4 avail Mem ... 765 root 20 0 0 0 0 S 3.6 0.0 34966:46 [kswapd3] ... 2/ swap space usage is low (4MB) 3/ swap In/Out is huge and symmetrical ~500kB/s in and ~500kB/s out Both situations are wrong as they are using swap in/out extensively, however the multi-gen LRU situation is 10times worse. The perf record of case /A/ - 100.00% 0.00% kswapd3 [kernel.kallsyms] [k] kswapd ▒ - kswapd ▒ - 99.88% balance_pgdat ▒ - 99.84% shrink_node ▒ - 99.78% shrink_many ▒ - 61.66% shrink_one ▒ - 55.32% try_to_shrink_lruvec ▒ - 49.80% try_to_inc_max_seq.constprop.0 ▒ - 49.53% walk_mm ▒ - 49.46% walk_page_range ▒ - 49.32% __walk_page_range ▒ - walk_pgd_range ▒ - walk_p4d_range ▒ - walk_pud_range ▒ - 49.02% walk_pmd_range ▒ - 45.94% get_next_vma ▒ - 30.08% mas_find ▒ - 29.33% mas_walk ▒ 26.83% mtree_range_walk ▒ 2.86% should_skip_vma ▒ 0.58% mas_next_slot ▒ 1.25% walk_pmd_range_locked.isra.0 ▒ - 5.46% evict_folios ▒ - 3.41% shrink_folio_list ▒ - 1.15% pageout ▒ - swap_writepage ▒ - 1.12% swap_writepage_bdev_sync ▒ - 1.01% submit_bio_wait ▒ - 1.00% __submit_bio_noacct ▒ - __submit_bio ▒ - zram_bio_write ▒ - 0.96% zram_write_page ▒ - 0.82% lzorle_compress ▒ - lzogeneric1x_1_compress ▒ 0.73% lzo1x_1_do_compress ▒ 0.68% __remove_mapping ▒ - 1.02% isolate_folios ▒ - scan_folios ▒ 0.65% isolate_folio.isra.0 ▒ 0.55% move_folios_to_lru ▒ - 5.43% lruvec_is_sizable ▒ - 0.93% get_swappiness ▒ mem_cgroup_get_nr_swap_pages ▒ - 32.07% lru_gen_rotate_memcg ▒ - 3.23% _raw_spin_lock_irqsave ▒ 2.32% native_queued_spin_lock_slowpath ▒ 1.91% get_random_u8 ▒ - 0.94% _raw_spin_unlock_irqrestore ▒ - asm_sysvec_apic_timer_interrupt ▒ - sysvec_apic_timer_interrupt ▒ - 0.69% __sysvec_apic_timer_interrupt ▒ - hrtimer_interrupt ▒ - 0.65% __hrtimer_run_queues ▒ - 0.63% tick_sched_timer ▒ - 0.62% tick_sched_handle ▒ - update_process_times ▒ 0.51% scheduler_tick ▒ The perf record of case /B/ - 100.00% 0.00% kswapd3 [kernel.kallsyms] [k] kswapd ▒ - kswapd ▒ - 99.66% balance_pgdat ▒ - 90.96% shrink_node ▒ - 75.69% shrink_node_memcgs ▒ - 25.73% shrink_lruvec ▒ - 18.74% get_scan_count ▒ 2.76% mem_cgroup_get_nr_swap_pages ▒ - 2.50% blk_finish_plug ▒ - __blk_flush_plug ▒ blk_mq_flush_plug_list ▒ 1.02% shrink_inactive_list ▒ 1.01% inactive_is_low ▒ - 17.33% shrink_slab_memcg ▒ - 4.02% do_shrink_slab ▒ - 1.57% nfs4_xattr_entry_count ▒ - list_lru_count_one ▒ 0.56% __rcu_read_unlock ▒ - 0.79% super_cache_count ▒ list_lru_count_one ▒ - 0.68% nfs4_xattr_cache_count ▒ - list_lru_count_one ▒ xa_load ▒ 3.12% _find_next_bit ▒ 1.87% __radix_tree_lookup ▒ 0.67% up_read ▒ 0.67% down_read_trylock ▒ - 16.34% mem_cgroup_iter ▒ 0.57% __rcu_read_lock ▒ 0.54% __rcu_read_unlock ▒ - 9.36% shrink_slab ▒ - do_shrink_slab ▒ - 2.37% super_cache_count ▒ 1.04% list_lru_count_one ▒ 2.14% count_shadow_nodes ▒ 1.71% kfree_rcu_shrink_count ▒ 1.24% vmpressure ▒ - 15.27% prepare_scan_count ▒ - 15.04% do_flush_stats ▒ - 14.93% cgroup_rstat_flush ▒ - cgroup_rstat_flush_locked ▒ 13.20% mem_cgroup_css_rstat_flush ▒ 0.78% __blkcg_rstat_flush.isra.0 ▒ - 5.87% shrink_active_list ▒ 2.16% __count_memcg_events ▒ 1.64% _raw_spin_lock_irq ▒ 0.94% isolate_lru_folios ▒ 2.24% mem_cgroup_iter ▒ Could I ask for any suggestions on how to avoid the kswapd utilization pattern? There is a free RAM in each numa node for the few MB used in swap: NUMA stats: NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 MemTotal: 65048 65486 65486 65486 65486 65486 65486 65469 65486 65486 65486 65486 65486 65486 65486 65424 MemFree: 468 601 1200 302 548 1879 2321 2478 1967 2239 1453 2417 2623 2833 2530 2269 the in/out usage does not make sense for me nor the CPU utilization by multi-gen LRU. Many thanks and best regards, -- Jaroslav Pulchart