Re: [RFC] memory cgroup: my thoughts on memsw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(2014/09/16 4:14), Johannes Weiner wrote:
Hi Vladimir,

On Thu, Sep 04, 2014 at 06:30:55PM +0400, Vladimir Davydov wrote:
To sum it up, the current mem + memsw configuration scheme doesn't allow
us to limit swap usage if we want to partition the system dynamically
using soft limits. Actually, it also looks rather confusing to me. We
have mem limit and mem+swap limit. I bet that from the first glance, an
average admin will think it's possible to limit swap usage by setting
the limits so that the difference between memory.memsw.limit and
memory.limit equals the maximal swap usage, but (surprise!) it isn't
really so. It holds if there's no global memory pressure, but otherwise
swap usage is only limited by memory.memsw.limit! IMHO, it isn't
something obvious.

Agreed, memory+swap accounting & limiting is broken.

  - Anon memory is handled by the user application, while file caches are
    all on the kernel. That means the application will *definitely* die
    w/o anon memory. W/o file caches it usually can survive, but the more
    caches it has the better it feels.

  - Anon memory is not that easy to reclaim. Swap out is a really slow
    process, because data are usually read/written w/o any specific
    order. Dropping file caches is much easier. Typically we have lots of
    clean pages there.

  - Swap space is limited. And today, it's OK to have TBs of RAM and only
    several GBs of swap. Customers simply don't want to waste their disk
    space on that.

Finally, my understanding (may be crazy!) how the things should be
configured. Just like now, there should be mem_cgroup->res accounting
and limiting total user memory (cache+anon) usage for processes inside
cgroups. This is where there's nothing to do. However, mem_cgroup->memsw
should be reworked to account *only* memory that may be swapped out plus
memory that has been swapped out (i.e. swap usage).

But anon pages are not a resource, they are a swap space liability.
Think of virtual memory vs. physical pages - the use of one does not
necessarily result in the use of the other.  Without memory pressure,
anonymous pages do not consume swap space.

What we *should* be accounting and limiting here is the actual finite
resource: swap space.  Whenever we try to swap a page, its owner
should be charged for the swap space - or the swapout be rejected.

For hard limit reclaim, the semantics of a swap space limit would be
fairly obvious, because it's clear who the offender is.

However, in an overcommitted machine, the amount of swap space used by
a particular group depends just as much on the behavior of the other
groups in the system, so the per-group swap limit should be enforced
even during global reclaim to feed back pressure on whoever is causing
the swapout.  If reclaim fails, the global OOM killer triggers, which
should then off the group with the biggest soft limit excess.

As far as implementation goes, it should be doable to try-charge from
add_to_swap() and keep the uncharging in swap_entry_free().

We'll also have to extend the global OOM killer to be memcg-aware, but
we've been meaning to do that anyway.


When we introduced memsw limitation, we tried to avoid affecting global memory reclaim.
Then, we did memory+swap limitation.

Now, global memory reclaim is memcg-aware. So, I think swap-limitation rather than
anon+swap may be a choice. The change will reduce res_counter access. Hmm, it will be
desireble to move anon pages to Unevictable if memcg's swap slot is 0.

Anyway, I think softlimit should be re-implemented, 1st. It will be starting point.

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]