On Thu, 20 Mar 2025 at 03:38, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Mar 19, 2025 at 02:41:43PM +0800, Jingxiang Zeng wrote: > > From: Zeng Jingxiang <linuszeng@xxxxxxxxxxx> > > > > memsw account is a very useful knob for container memory > > overcommitting: It's a great abstraction of the "expected total > > memory usage" of a container, so containers can't allocate too > > much memory using SWAP, but still be able to SWAP out. > > > > For a simple example, with memsw.limit == memory.limit, containers > > can't exceed their original memory limit, even with SWAP enabled, they > > get OOM killed as how they used to, but the host is now able to > > offload cold pages. > > > > Similar ability seems absent with V2: With memory.swap.max == 0, the > > host can't use SWAP to reclaim container memory at all. But with a > > value larger than that, containers are able to overuse memory, causing > > delayed OOM kill, thrashing, CPU/Memory usage ratio could be heavily > > out of balance, especially with compress SWAP backends. > > > > This patch set adds two interfaces to control the behavior of the > > memory.swap.max/current in cgroupv2: > > > > CONFIG_MEMSW_ACCOUNT_ON_DFL > > cgroup.memsw_account_on_dfl={0, 1} > > > > When one of the interfaces is enabled: memory.swap.current and > > memory.swap.max represents the usage/limit of swap. > > When neither is enabled (default behavior),memory.swap.current and > > memory.swap.max represents the usage/limit of memory+swap. > > This should be new knobs, e.g. memory.memsw.current, memory.memsw.max. > > Overloading the existing swap knobs is confusing. > > And there doesn't seem to be a good reason to make the behavior > either-or anyway. If memory.swap.max=max (default), it won't interfere > with the memsw operation. And it's at least conceivable somebody might > want to set both, memsw.max > swap.max, to get some flexibility while > excluding the craziest edge cases. Hi Johannes, If both memsw.max and swap.max are provided in cgroupv2, there will be some issues as follows: (1. As Shakeel Butt mentioned, currently memsw and swap share the page_counter, and we need to provide a separate page_counter for memsw. (2. Currently, the statistics for memsw and swap are mutually exclusive. For example, during uncharging, both memsw and swap call the __mem_cgroup_uncharge_swap function together, and this function currently only selects a single counter for statistics based on the static do_memsw_account. As mentioned above, this patch set considers the approach suggested by Roman Gushchin[1], which involves switching to cgroupv1 behavior through a configuration option, making it easier to implement. Link: https://lore.kernel.org/all/Zk-fQtFrj-2YDJOo@xxxxxxxxxxxxxxxxxxxxxxxxx/ [1]