On Fri, 17 Jul 2020, Chris Down wrote: > > With the proposed anon_reclaimable, do you have any reliability concerns? > > This would be the amount of lazy freeable memory and memory that can be > > uncharged if compound pages from the deferred split queue are split under > > memory pressure. It seems to be a very precise value (as slab_reclaimable > > already in memory.stat is), so I'm not sure why there is a reliability > > concern. Maybe you can elaborate? > > Ability to reclaim a page is largely about context at the time of reclaim. For > example, if you are running at the edge of swap, at a metric that truly > describes "reclaimable memory" will contain vastly different numbers from one > second to the next as cluster and page availability increases and decreases. > We may also have to do things like look for youngness at reclaim time, so I'm > not convinced metrics like this makes sense in the general case. ... > Again, I'm curious why this can't be solved by artificial workingset > pressurisation and monitoring. Generally, the most reliable reclaim metrics > come from operating reclaim itself. > Perhaps this is best discussed in the context I gave in the earlier thread: imagine a thp-backed heap of 64MB and then a malloc implementation doing MADV_DONTNEED over all but one page in every one of these pageblocks. On a 4.3 kernel, for example, memory.current for the heap segment is now (64MB / 2MB) * 4KB = 128KB because we have synchronous splitting and uncharging of the underlying hugepage. On a 4.15 kernel, for example, memory.current is still 64MB because the underlying hugepages are still charged to the memcg due to deferred split queues. For any application that monitors this, pressurization is not going to help: the memory will be reclaimed under memcg pressure but we aren't facing that pressure yet. Userspace could identify this as a memory leak unless we describe what anon memory is actually reclaimable in this context (including on systems without swap). For any entity that uses this information to infer if new work can be scheduled in this memcg (the reason MemAvailable exists in /proc/meminfo at the system level), this is now dramatically skewed. At worse, on a swapless system, this memory is seen from userspace as unreclaimable because it's charged anon. Do you have other suggestions for how userspace can understand what anon is reclaimable in this context before encountering memory pressure? If so, it may be a great alternative to this: I haven't been able to think of such a way other than an anon_reclaimable stat.