Re: [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM

Sourav Panda <souravpanda@xxxxxxxxxx> · Sun, 2 Feb 2025 23:20:11 -0800

On Sun, Feb 2, 2025 at 6:54 PM David Rientjes <rientjes@xxxxxxxxxx> wrote:
On Fri, 31 Jan 2025, Sourav Panda wrote:

> Hi,

> 

> KSM is a powerful tool for deduplicating memory, reducing usage by merging

> 

> identical pages across processes. However, there are certain interface and

> 

> implementation aspect that prevents its deployment in our use case; wherein

> 

> security and efficiency (CPU overhead - due to background scanning) are of

> 

> greater importance.

> 

> We propose Selective KSM, a mechanism to control when the merging takes

> 

> place and what pages can be merged together. We do this by partitioning the

> 

> merge-space as per security-domains and carryout the merging as part of a

> 

> synchronous syscall. Doing so, we ensure sensitive-content is not merged

> 

> with non-sensitive content.

> 

Thanks for proposing this, Sourav, it sounds like a useful topic to 

discuss.

Regarding the above, this looks like this is analogous to doing 

synchronous MADV_COLLAPSE in process context and not relying on khugepaged 

as the sole mechanism for doing that collapse?  In your case, it's 

userspace doing a merge in process context without relying on ksmd.

Is s/Selective/Userspace/ the way to think about it?

Yes, this is a good analogy. 

Does this require a fully cooperative guest for it to work properly?

A guest VM would have to be fully cooperative to achieve this (as per the current proposal). Furthermore, later on we can think of implementing an advisor (e.g., like how we have KSM advisors today for adapting some parameters) for optimization sake. 

> Our overall goal is to optimize the memory utilization in a virtualized

> 

> environment, wherein there exists significant duplications across guest

> 

> instances (e.g., kernel). With the better ability of the operator to  group

> pages

> 

> as per security and similarity, Selective KSM improves security and

> efficiency.

> 

> Other than virtualized environments, we also want Selective KSM to work

> 

> well in containerized environments.

> 

> An example API could look like this ( Alternatively we can do it through

> sysfs

> 

> without adding syscalls):

> 

> // This feature shall be gated by a KConfig: “CONFIG_SELECTIVE_KSM”

> 

> // Create a unique identifier known to userland.

> 

> char *ksm_name = “some_name”;

> 

> // ksm_open() creates and opens a new, or opens an existing, ksm partition

> obj.

> 

> // flags is a bit mask to determine if the merging is sync, etc.

> 

> // KSM_SYNC: Carryout synchronous merging (no-background scanning).

> 

> // KSM_CREAT: Creates a KSM partition obj if it does not exist.

> 

> // KSM_EXCL: If KSM partition obj with name already exists and

> 

> // KSM_CREAT is also specified, return err.

> 

> // modes is used to handle permissions:

> 

> // O_RDONLY, O_WRONLY, O_RDWR, S_IRUSR, S_IWUSR, S_IXUSR

> 

> // On success, returns a file descriptor (a nonnegative integer) and

> creates the

> 

> // sysfs path:

> 

> // /sys/kernel/mm/ksm/partition/<ksm_name>/

> 

> // On failure, it returns -1 and sets errno to indicate the error.

> 

> int ksm_fd = ksm_open(ksm_name, flag, mode);

> 

> // Destroy the name. The named object will be removed only after all open

> 

> // references are closed. On success, ksm_unlink() returns 0.

> 

> //  On failure, it returns -1 and sets errno to indicate the error.

> 

> ksm_unlink(ksm_name);

> 

> // Trigger merge. Only valid if KSM_SYNC is set during ksm_open().

> 

> ksm_merge(ksm_fd, pid, addr, size);

> 

> // Trigger unmerge. Only valid if KSM_SYNC is set during ksm_open().

> 

> ksm_unmerge(ksm_fd, pid, addr, size);

> 

> With regards,

> 

> Sourav Panda

>