On Thu, Oct 5, 2023 at 9:30 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > On Thu, Oct 05, 2023 at 02:31:03AM -0700, Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > I am not really sure what you mean here. > > My "vision" is to treat WORKINGSET_ entries as events. > That would mean implementing per-node tracking for vm_event_item > (costlier?). > That would mean node_stat_item and vm_event_item being effectively > equal, so they could be merged in one. > That would be situation to come up with new classification based on use > cases (e.g. precision/timeliness requirements, state vs change > semantics). > > (Do not take this as blocker of the patch 1/2, I rather used the > opportunity to discuss a greater possible cleanup.) Yeah ideally we can clean this up separately. I would be careful about userspace exposure though. It seems like CONFIG_VM_EVENT_COUNTERS is used to control tracking events and displaying them in vmstat, so moving items between node_stat_item and vm_event_item (or merging them) won't be easy. > > > We don't track things like OOM_KILL and DROP_PAGECACHE per memcg as > > far as I can tell. > > Ah, good. (I forgot only subset of entries is relevant for memcgs.) > > > This will mean that WORKINGSET_* state will become more stale. We will > > need 4096 as many updates as today to get a flush. These are used by > > internal flushers (reclaim), and are exposed to userspace. I am not > > sure we want to do that. > > snapshot_refaults() doesn't seem to follow after flush > and > workigset_refault()'s flush doesn't seem to preceed readers > > Is the flush misplaced or have I overlooked something? > (If the former, it seems to work good enough even with the current > flushing heuristics :-)) We flush in prepare_scan_count() before reading WORKINGSET_ACTIVATE_* state. That flush also implicitly precedes every call to snapshot_refaults(), which is unclear and not so robust, but we are effectively flushing before snapshot_refaults() too. > > > Michal