Re: [RFC 0/5] add option to restore swap account to cgroupv1 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 20 Mar 2025 at 03:38, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Mar 19, 2025 at 02:41:43PM +0800, Jingxiang Zeng wrote:
> > From: Zeng Jingxiang <linuszeng@xxxxxxxxxxx>
> >
> > memsw account is a very useful knob for container memory
> > overcommitting: It's a great abstraction of the "expected total
> > memory usage" of a container, so containers can't allocate too
> > much memory using SWAP, but still be able to SWAP out.
> >
> > For a simple example, with memsw.limit == memory.limit, containers
> > can't exceed their original memory limit, even with SWAP enabled, they
> > get OOM killed as how they used to, but the host is now able to
> > offload cold pages.
> >
> > Similar ability seems absent with V2: With memory.swap.max == 0, the
> > host can't use SWAP to reclaim container memory at all. But with a
> > value larger than that, containers are able to overuse memory, causing
> > delayed OOM kill, thrashing, CPU/Memory usage ratio could be heavily
> > out of balance, especially with compress SWAP backends.
> >
> > This patch set adds two interfaces to control the behavior of the
> > memory.swap.max/current in cgroupv2:
> >
> > CONFIG_MEMSW_ACCOUNT_ON_DFL
> > cgroup.memsw_account_on_dfl={0, 1}
> >
> > When one of the interfaces is enabled: memory.swap.current and
> > memory.swap.max represents the usage/limit of swap.
> > When neither is enabled (default behavior),memory.swap.current and
> > memory.swap.max represents the usage/limit of memory+swap.
>
> This should be new knobs, e.g. memory.memsw.current, memory.memsw.max.
>
> Overloading the existing swap knobs is confusing.
>
> And there doesn't seem to be a good reason to make the behavior
> either-or anyway. If memory.swap.max=max (default), it won't interfere
> with the memsw operation. And it's at least conceivable somebody might
> want to set both, memsw.max > swap.max, to get some flexibility while
> excluding the craziest edge cases.

Hi Johannes,

If both memsw.max and swap.max are provided in cgroupv2, there will be some
issues as follows:
(1. As Shakeel Butt mentioned, currently memsw and swap share the page_counter,
and we need to provide a separate page_counter for memsw.
(2. Currently, the statistics for memsw and swap are mutually
exclusive. For example,
during uncharging, both memsw and swap call the __mem_cgroup_uncharge_swap
function together, and this function currently only selects a single
counter for statistics
based on the static do_memsw_account.

As mentioned above, this patch set considers the approach suggested by Roman
Gushchin[1], which involves switching to cgroupv1 behavior through a
configuration
option, making it easier to implement.

Link: https://lore.kernel.org/all/Zk-fQtFrj-2YDJOo@xxxxxxxxxxxxxxxxxxxxxxxxx/
[1]




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux