Hi all, We are observing some regressions in workingset_refault on our newly upgraded 5.15.19 nodes with zram as swap. This manifests in several ways: 1) Regression of workingset_refault duration observed in flamegraph We regularly collect flamegraphs for running services on the node. Since upgrade to 5.15.19, we see that workingset_refault occupied a more significant part of the service flamegraph (13%) with the following call trace workingset_refault+0x128 add_to_page_cache_lru+0x9f page_cache_ra_unbounded+0x154 force_page_cache_ra+0xe2 filemap_get_pages+0xe9 filemap_read+0xa4 xfs_file_buffered_read+0x98 xfs_file_read_iter+0x6a new_sync_read+0x118 vfs_read+0xf2 __x64_sys_pread64+0x89 do_syscall_64+0x3b entry_SYSCALL_64_after_hwframe+0x44 2) Regression of userspace performance sensitive code We have some performance sensentive code running in userspace that have their runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as: now = clock_gettime(CLOCK_THREAD_CPUTIME_ID) func() elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more frequently than before. This went away after we disabled swap for the service using `memory.swap.max=0` memcg configuration. The new thing in 5.15 workingset_refault seems to be introduction of mem_cgroup_flush_stats() by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush lruvec stats in the refault). Given that mem_cgroup_flush_stats can take quite a long time for us on the standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service ) sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats Tracing function calls slower than 10 ms... Ctrl+C to quit. TIME COMM PID LAT(ms) RVAL FUNC 0.000000 <redacted> 804776 11.50 200 mem_cgroup_flush_stats 0.343383 <redacted> 647496 10.58 200 mem_cgroup_flush_stats 0.604309 <redacted> 804776 10.50 200 mem_cgroup_flush_stats 1.230416 <redacted> 803293 10.01 200 mem_cgroup_flush_stats 1.248442 <redacted> 646400 11.02 200 mem_cgroup_flush_stats could it be possible that workingset_refault in some unfortunate case can take much longer than before such that it increases the time observed by CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of workingset_refault observed by perf ?