Hi Johannes, we have been debugging an issue reported against our 4.12-based kernel, where a DB-based workload would start thrashing badly at some point, making the system unusable. This didn't happen when replacing the kernel with older 4.4-based one (and keeping everything else the same). Unfortunately we don't have the reproducer in-house and the conditions might be also configuration specific (rootfs is on NFS), but we provided vmstat monitoring instructions and later tracing and from the data we got we found that the workload at some point fills almost the whole memory with anonymous pages (namely shmem), pushing almost the whole page cache out, and filling part of the swap. The 4.4-based kernel then recovers quickly without excessive anon swapping, which suggests the shmem pages stop being frequently accessed. However the 4.12-based kernel is unable to recover and grow the page cache back (both active and inactive) and keeps thrashing on it. We have considered the large upstream changes between 4.4 and 4.12 which include memcg awareness (but there's a single memcg and disabling memcg makes no difference) and node-based reclaim (there's no disproportionally sized zone). Then we suspected 4.12 commit 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition") and how it affects inactive_list_is_low() when called from shrink_list() - the theory was that we decide to shrink file active list too much (by setting inactive_ratio=0) due to refault detection, which in turn means we shrink file pages too much. This was confirmed by removing the inactive_ratio=0 part, after which the 4.12-based kernel stopped thrashing with the workload. Then we investigated what leads to the main condition of the logic - "lruvec->refaults != refaults", by adding some more tracing to inactive_list_is_low() and snapshot_refaults(). We suspected bad interactions due to multiple direct reclaimers, but what I mostly see is the following pattern of kswapd activity: - kswapd finishes balancing, makes a snapshot of lruvec->refaults - after a while (can be up to few seconds) kswapd is woken up again and the number of refaults meanwhile is changed by some relatively small number (tens or hundreds) since the snapshot, so the condition "lruvec->refaults != refaults" becomes true. - inactive_list_is_low() keeps being called as part of kswapd operation, always the condition is true as the snapshot didn't change. During that time, the refaults counter is either unchanged or changes only by a few refaults. Thus, the whole kswapd activity on the file lru is focused on the active lru. Since the intention of commit 2a2e48854d70 is to detect workingset transitions, it seems to me it's not working well in this case, as there's no such transition - the workload just cannot keep its page cache working set in memory, because it's excessively reclaimed instead of anonymous memory. The '!=' condition is perhaps too coarse and static and doesn't reflect how many refaults there were or if refaults keep happening during kswapd operation - a single refault between two kswapd runs can affect the whole second run. I wonder if there shouldn't be at least some kind of decay - when the condition triggers, update the snapshot to a value between the old snapshot and current value, so if refaults do not keep occuring, after some number of calls the condition will stop being true? What do you think? I should also mention that we don't have the relatively recent commit 2c012a4ad1a2 ("mm: vmscan: scan anonymous pages on file refaults") in the 4.12-based kernel. It could in theory make the problem also go away, as the "excessively true" condition would now also be considered when inactive_list_is_low() is called from get_scan_count() (in v5.4; I know there were big reorganizations in last merge window), and perhaps change some SCAN_FILE outcomes to SCAN_FRACT. But I think it would be better to do something with the root cause first.