On Fri, 31 Jan 2025, Sourav Panda wrote: > Hi, > > KSM is a powerful tool for deduplicating memory, reducing usage by merging > > identical pages across processes. However, there are certain interface and > > implementation aspect that prevents its deployment in our use case; wherein > > security and efficiency (CPU overhead - due to background scanning) are of > > greater importance. > > We propose Selective KSM, a mechanism to control when the merging takes > > place and what pages can be merged together. We do this by partitioning the > > merge-space as per security-domains and carryout the merging as part of a > > synchronous syscall. Doing so, we ensure sensitive-content is not merged > > with non-sensitive content. > Thanks for proposing this, Sourav, it sounds like a useful topic to discuss. Regarding the above, this looks like this is analogous to doing synchronous MADV_COLLAPSE in process context and not relying on khugepaged as the sole mechanism for doing that collapse? In your case, it's userspace doing a merge in process context without relying on ksmd. Is s/Selective/Userspace/ the way to think about it? Does this require a fully cooperative guest for it to work properly? > Our overall goal is to optimize the memory utilization in a virtualized > > environment, wherein there exists significant duplications across guest > > instances (e.g., kernel). With the better ability of the operator to group > pages > > as per security and similarity, Selective KSM improves security and > efficiency. > > Other than virtualized environments, we also want Selective KSM to work > > well in containerized environments. > > An example API could look like this ( Alternatively we can do it through > sysfs > > without adding syscalls): > > // This feature shall be gated by a KConfig: “CONFIG_SELECTIVE_KSM” > > // Create a unique identifier known to userland. > > char *ksm_name = “some_name”; > > // ksm_open() creates and opens a new, or opens an existing, ksm partition > obj. > > // flags is a bit mask to determine if the merging is sync, etc. > > // KSM_SYNC: Carryout synchronous merging (no-background scanning). > > // KSM_CREAT: Creates a KSM partition obj if it does not exist. > > // KSM_EXCL: If KSM partition obj with name already exists and > > // KSM_CREAT is also specified, return err. > > // modes is used to handle permissions: > > // O_RDONLY, O_WRONLY, O_RDWR, S_IRUSR, S_IWUSR, S_IXUSR > > // On success, returns a file descriptor (a nonnegative integer) and > creates the > > // sysfs path: > > // /sys/kernel/mm/ksm/partition/<ksm_name>/ > > // On failure, it returns -1 and sets errno to indicate the error. > > int ksm_fd = ksm_open(ksm_name, flag, mode); > > // Destroy the name. The named object will be removed only after all open > > // references are closed. On success, ksm_unlink() returns 0. > > // On failure, it returns -1 and sets errno to indicate the error. > > ksm_unlink(ksm_name); > > // Trigger merge. Only valid if KSM_SYNC is set during ksm_open(). > > ksm_merge(ksm_fd, pid, addr, size); > > // Trigger unmerge. Only valid if KSM_SYNC is set during ksm_open(). > > ksm_unmerge(ksm_fd, pid, addr, size); > > With regards, > > Sourav Panda >