On Thu, Oct 5, 2023 at 2:06 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > On Wed, Oct 04, 2023 at 02:36:19PM -0400, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > Yes, it's because of node resolution which event counters generally > > don't have. Some of the refault events influence node-local reclaim > > decisions, see mm/vmscan.c::snapshot_refaults(). > > > > There are a few other event counters in the stat array that people > > thought would be useful to have split out in > > /sys/devices/system/node/nodeN/vmstat to understand numa behavior > > better. > > > > It's a bit messy. > > > > Some events would be useful to move to 'stats' for the numa awareness, > > such as the allocator stats and reclaim activity. > > > > Some events would be useful to move to 'stats' for the numa awareness, > > but don't have the zone resolution required by them, such as > > kswapd/kcompactd wakeups. > > Thanks for the enlightenment. > > > Some events aren't numa specific, such as oom kills, drop_pagecache. > > These are oddballs indeed. As with the normalization patchset these are > counted as PAGE_SIZE^W 1 error but they should rather be an infinite > error (to warrant a flush). > > So my feedback to this series is: > - patch 1/2 -- creating two classes of units is consequence of unclarity > between state and events (as in event=Δstate/Δt) and resolution > (global vs per-node), so the better approach would be to tidy this up, I am not really sure what you mean here. I understand that this series fixes the unit normalization for state but leaves events out of it. Looking at the event items tracked by memcg in memcg_vm_event_stat it looks to me that most of them correspond roughly to a page's worth of updates (all but the THP_* events). We don't track things like OOM_KILL and DROP_PAGECACHE per memcg as far as I can tell. Do you mean that we should add something similar to memcg_page_state_unit() for events as well to get all of them right? If yes, I think that should be easy to add, it would only special case THP_* events. Alternatively, I can add a comment above the call to memcg_rstat_updated() in __count_memcg_events() explaining why we don't normalize the event count for now. > - patch 2/2 -- it could use the single unit class that exists, > it'll bound the error of printed numbers afterall (and can be changed > later depending on how it affects internal consumers). This will mean that WORKINGSET_* state will become more stale. We will need 4096 as many updates as today to get a flush. These are used by internal flushers (reclaim), and are exposed to userspace. I am not sure we want to do that. > > My 0.02€, > Michal