On Thu, Nov 30, 2023 at 11:56:42AM -0500, Johannes Weiner wrote: > On Thu, Nov 30, 2023 at 04:57:41PM +0100, Michal Hocko wrote: > > On Thu 30-11-23 07:36:53, Dan Schatzberg wrote: > > [...] > > > In contrast, I argue in favor of a swappiness setting not as a way to implement > > > custom reclaim algorithms but rather to bias the balance of anon vs file due to > > > differences of proactive vs reactive reclaim. In this context, swappiness is the > > > existing interface for controlling this balance and this patch simply allows for > > > it to be configured differently for proactive vs reactive reclaim. > > > > I do agree that swappiness is a better interface than explicit anon/file > > but the problem with swappiness is that it is more of a hint for the reclaim > > rather than a real control. Just look at get_scan_count and its history. > > Not only its range has been extended also the extent when it is actually > > used has been changing all the time and I think it is not a stretch to > > assume that trend to continue. > > Right, we did tweak the edge behavior of e.g. swappiness=0. And we > extended the range to express "anon is cheaper than file", which > wasn't possible before, to support the compressed memory case. > > However, its meaning and impact has been remarkably stable over the > years: it allows userspace to specify the relative cost of paging IO > between file and anon pages. This comment is from 2.6.28: > > /* > * With swappiness at 100, anonymous and file have the same priority. > * This scanning priority is essentially the inverse of IO cost. > */ > anon_prio = sc->swappiness; > file_prio = 200 - sc->swappiness; > > And this is it today: > > /* > * Calculate the pressure balance between anon and file pages. > * > * The amount of pressure we put on each LRU is inversely > * proportional to the cost of reclaiming each list, as > * determined by the share of pages that are refaulting, times > * the relative IO cost of bringing back a swapped out > * anonymous page vs reloading a filesystem page (swappiness). > * > * Although we limit that influence to ensure no list gets > * left behind completely: at least a third of the pressure is > * applied, before swappiness. > * > * With swappiness at 100, anon and file have equal IO cost. > */ > total_cost = sc->anon_cost + sc->file_cost; > anon_cost = total_cost + sc->anon_cost; > file_cost = total_cost + sc->file_cost; > total_cost = anon_cost + file_cost; > > ap = swappiness * (total_cost + 1); > ap /= anon_cost + 1; > > fp = (200 - swappiness) * (total_cost + 1); > fp /= file_cost + 1; > > So swappiness still means the same it did 15 years ago. We haven't > changed the default swappiness setting, and we haven't broken any > existing swappiness configurations through VM changes in that time. > > There are a few scenarios where swappiness doesn't apply: > > - No swap. Oh well, that seems reasonable. > > - Priority=0. This applies to near-OOM situations where the MM system > tries to save itself. This isn't a range in which proactive > reclaimers (should) operate. > > - sc->file_is_tiny. This doesn't apply to cgroup reclaim and thus > proactive reclaim. > > - sc->cache_trim_mode. This implements clean cache dropbehind, and > applies in the presence of large, non-refaulting inactive cache. The > assumption there is that this data is reclaimable without involving > IO to evict, and without the expectation of refault IO in the > future. Without IO involvement, the relative IO cost isn't a > factor. This will back off when refaults are observed, and the IO > cost setting is then taken into account again as expected. > > If you consider swappiness to mean "reclaim what I ask you to", then > this would override that, yes. But in the definition of relative IO > cost, this decision making is permissible. > > Note that this applies to the global swappiness setting as well, and > nobody has complained about it. > > So I wouldn't say it's merely a reclaim hint. It controls a very > concrete and influential factor in VM decision making. And since the > global swappiness is long-established ABI, I don't expect its meaning > to change significantly any time soon. Are you saying the edge case behavior of global swappiness and the user provided swappiness through memory.reclaim should remain same?