On Fri, Dec 15, 2023 at 12:22 AM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> Hi Fabian,
>
> On Thu, Dec 14, 2023 at 10:00 AM Fabian Deutsch <fdeutsch@xxxxxxxxxx> wrote:
>
> > Yep - for container use-cases.
> >
> > Now a few thoughts in this direction:
> > - With swap per cgroup you loose the big "statistical" benefit of having swap on a node level. well, it depends on the size of the cgroup (i.e. system.slice is quite large).
>
> Just to clarify, the "node" you mean the "node" in kubernetes sense,
> which is the whole machine. In the Linux kernel MM context, the node
> often refers to the NUMA memory node, that is not what you mean here,
> right?
Correct - I was referring to Kubernetes, and not numa nodes.
>
> > - With todays node level swap, and setting memory.swap.max=0 for all cgroups allows you toachieve a similar behavior (only opt-in cgroups will get swap).
> > - the above approach however will still have a shared swap backend for all cgroups.
>
> Yes, the "memory.swap.tires" idea is trying to allow cgroups to select
> a subset of the swap backend in a specific order. It is still in the
> early stage of discussion. If you have any suggestion or feedback in
> that direction, I am looking forward to hearing that.
Interesting. There have been concerns to leak confidential data accidentally when it's getting written to a swap device.
The other less discussed item was QoS for swap io traffic.
At a first glance it seems like tires could help with the second use-case.
- fabian
>
> Chris
>