On Thu, Jun 27, 2024 at 2:21 AM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote: > > > On 27/06/2024 00.07, Yosry Ahmed wrote: > > On Wed, Jun 26, 2024 at 2:35 PM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote: > >> > >> On 26/06/2024 00.59, Yosry Ahmed wrote: > >>> On Tue, Jun 25, 2024 at 3:35 PM Christoph Lameter (Ampere) <cl@xxxxxxxxx> wrote: > >>>> > >>>> On Tue, 25 Jun 2024, Yosry Ahmed wrote: > >>>> > [...] > >> > >> I implemented a variant using completions as Yosry asked for: > >> > >> [V3] https://lore.kernel.org/all/171943668946.1638606.1320095353103578332.stgit@firesoul/ > > > > Thanks! I will take a look at this a little bit later. I am wondering > > if you could verify if that solution fixes the problem with kswapd > > flushing? > > I will deploy V3 on some production metals and report back in that thread. > > For this patch V2, I already have some results that show it solves the > kswapd lock contention. Attaching grafana screenshot comparing two > machines without/with this V2 patch. Green (16m1253) without patch, and > Yellow line (16m1254) with patched kernel. These machines have 12 NUMA > nodes and thus 12 kswapd threads, and CPU time is summed for these threads. Thanks for the data! Looking forward to whether v3 also fixes the problem. I think it should, especially with the timeout, but let's see :) > > Zooming in with perf record we can also see the lock contention is gone. > - sudo perf record -g -p $(pgrep -d, kswapd) -F 499 sleep 60 > - sudo perf report --no-children --call-graph graph,0.01,callee > --sort symbol > > > On a machine (16m1254) with this V2 patch: > > Samples: 7K of event 'cycles:P', Event count (approx.): 61228473670 > Overhead Symbol > + 8.28% [k] mem_cgroup_css_rstat_flush > + 6.69% [k] xfs_perag_get_tag > + 6.51% [k] radix_tree_next_chunk > + 5.09% [k] queued_spin_lock_slowpath > + 3.94% [k] srso_alias_safe_ret > + 3.62% [k] srso_alias_return_thunk > + 3.11% [k] super_cache_count > + 2.96% [k] mem_cgroup_iter > + 2.95% [k] down_read_trylock > + 2.48% [k] shrink_lruvec > + 2.12% [k] isolate_lru_folios > + 1.76% [k] dentry_lru_isolate > + 1.74% [k] radix_tree_gang_lookup_tag > > > On a machine (16m1253) without patch: > > Samples: 65K of event 'cycles:P', Event count (approx.): 492125554022 > Overhead SymbolCoverage] > + 55.84% [k] queued_spin_lock_slowpath > - 55.80% queued_spin_lock_slowpath > + 53.10% __cgroup_rstat_lock > + 2.63% evict > + 7.06% [k] mem_cgroup_css_rstat_flush > + 2.07% [k] page_vma_mapped_walk > + 1.76% [k] invalid_folio_referenced_vma > + 1.72% [k] srso_alias_safe_ret > + 1.37% [k] shrink_lruvec > + 1.23% [k] srso_alias_return_thunk > + 1.17% [k] down_read_trylock > + 0.98% [k] perf_adjust_freq_unthr_context > + 0.97% [k] super_cache_count > > I think this (clearly) shows that the patch works and eliminates kswapd > lock contention. > > --Jesper