Re: Memcg stat for available memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Adding more people who might be interested in this)


On Sun, Jun 28, 2020 at 3:15 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
>
> Hi everybody,
>
> I'd like to discuss the feasibility of a stat similar to
> si_mem_available() but at memcg scope which would specify how much memory
> can be charged without I/O.
>
> The si_mem_available() stat is based on heuristics so this does not
> provide an exact quantity that is actually available at any given time,
> but can otherwise provide userspace with some guidance on the amount of
> reclaimable memory.  See the description in
> Documentation/filesystems/proc.rst and its implementation.
>
>  [ Naturally, userspace would need to understand both the amount of memory
>    that is available for allocation and for charging, separately, on an
>    overcommitted system.  I assume this is trivial.  (Why don't we provide
>    MemAvailable in per-node meminfo?) ]
>
> For such a stat at memcg scope, we can ignore totalreserves and
> watermarks.  We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for
> both file pages and slab_reclaimable.
>
> We can infer lazily free memory by doing
>
>         file - (active_file + inactive_file)
>
> (This is necessary because lazy free memory is anon but on the inactive
>  file lru and we can't infer lazy freeable memory through pglazyfree -
>  pglazyfreed, they are event counters.)
>
> We can also infer the number of underlying compound pages that are on
> deferred split queues but have yet to be split with active_anon - anon (or
> is this a bug? :)
>
> So it *seems* like userspace can make a si_mem_available()-like
> calculation ("avail") by doing
>
>         free = memory.high - memory.current
>         lazyfree = file - (active_file + inactive_file)
>         deferred = active_anon - anon
>
>         avail = free + lazyfree + deferred +
>                 (active_file + inactive_file + slab_reclaimable) / 2
>
> For userspace interested in knowing how much memory it can charge without
> incurring I/O (and assuming it has knowledge of available memory on an
> overcommitted system), it seems like:
>
>  (a) it can derive the above avail amount that is at least similar to
>      MemAvailable,
>
>  (b) it can assume that all reclaim is considered equal so anything more
>      than memory.high - memory.current is disruptive enough that it's a
>      better heuristic than the above, or
>
>  (c) the kernel provide an "avail" stat in memory.stat based on the above
>      and can evolve as the kernel implementation changes (how lazy free
>      memory impacts anon vs file lru stats, how deferred split memory is
>      handled, any future extensions for "easily reclaimable memory") that
>      userspace can count on to the same degree it can count on
>      MemAvailable.
>
> Any thoughts?


I think we need to answer two questions:

1) What's the use-case?
2) Why is user space calculating their MemAvailable themselves not good?

The use case I have in mind is the latency sensitive distributed
caching service which would prefer to reduce the amount of its caching
over the stalls incurred by hitting the limit. Such applications can
monitor their MemAvailable and adjust their caching footprint.

For the second, I think it is to hide the internal implementation
details of the kernel from the user space. The deferred split queues
is an internal detail and we don't want that exposed to the user.
Similarly how lazyfree is implemented (i.e. anon pages on file LRU)
should not be exposed to the users.

Shakeel




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux