Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Tue, 15 Dec 2015 18:29:59 +0900

On 2015/12/15 17:30, Vladimir Davydov wrote:
On Tue, Dec 15, 2015 at 12:12:40PM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/15 0:30, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy we charge memsw, which is dubious, because:

  - memsw.limit must be >= memory.limit, so it is impossible to limit
    swap usage less than memory usage. Taking into account the fact that
    the primary limiting mechanism in the unified hierarchy is
    memory.high while memory.limit is either left unset or set to a very
    large value, moving memsw.limit knob to the unified hierarchy would
    effectively make it impossible to limit swap usage according to the
    user preference.

  - memsw.usage != memory.usage + swap.usage, because a page occupying
    both swap entry and a swap cache page is charged only once to memsw
    counter. As a result, it is possible to effectively eat up to
    memory.limit of memory pages *and* memsw.limit of swap entries, which
    looks unexpected.

That said, we should provide a different swap limiting mechanism for
cgroup2.
This patch adds mem_cgroup->swap counter, which charges the actual
number of swap entries used by a cgroup. It is only charged in the
unified hierarchy, while the legacy hierarchy memsw logic is left
intact.

I agree that the previous semantic was awkward. The problem I can see
with this approach is that once the swap limit is reached the anon
memory pressure might spill over to other and unrelated memcgs during
the global memory pressure. I guess this is what Kame referred to as
anon would become mlocked basically. This would be even more of an issue
with resource delegation to sub-hierarchies because nobody will prevent
setting the swap amount to a small value and use that as an anon memory
protection.

I guess this was the reason why this approach hasn't been chosen before

Yes. At that age, "never break global VM" was the policy. And "mlock" can be
used for attacking system.

If we are talking about "attacking system" from inside a container,
there are much easier and disruptive ways, e.g. running a fork-bomb or
creating pipes - such memory can't be reclaimed and global OOM killer
won't help.

You're right. We just wanted to avoid affecting global memory reclaim by
each cgroup settings.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>