Hello. On úterý 10. května 2022 14:22:42 CEST cgel.zte@xxxxxxxxx wrote: > From: xu xin <xu.xin16@xxxxxxxxxx> > > To use KSM, we have to explicitly call madvise() in application code, > which means installed apps on OS needs to be uninstall and source code > needs to be modified. It is inconvenient. > > In order to change this situation, We add a new proc file ksm_force > under /proc/<pid>/ to support turning on/off KSM scanning of a > process's mm dynamically. > > If ksm_force is set to 1, force all anonymous and 'qualified' VMAs > of this mm to be involved in KSM scanning without explicitly calling > madvise to mark VMA as MADV_MERGEABLE. But It is effective only when > the klob of /sys/kernel/mm/ksm/run is set as 1. > > If ksm_force is set to 0, cancel the feature of ksm_force of this > process (fallback to the default state) and unmerge those merged pages > belonging to VMAs which is not madvised as MADV_MERGEABLE of this process, > but still leave MADV_MERGEABLE areas merged. To my best knowledge, last time a forcible KSM was discussed (see threads [1], [2], [3] and probably others) it was concluded that a) procfs was a horrible interface for things like this one; and b) process_madvise() syscall was among the best suggested places to implement this (which would require a more tricky handling from userspace, but still). So, what changed since that discussion? P.S. For now I do it via dedicated syscall, but I'm not trying to upstream this approach. [1] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@xxxxxxx/ [2] https://lore.kernel.org/all/20190515145151.GG16651@xxxxxxxxxxxxxx/T/#u [3] https://lore.kernel.org/lkml/20190516172452.GA2106@avx2/ [4] https://gitlab.com/post-factum/pf-kernel/-/commits/ksm-5.17/ > Signed-off-by: xu xin <xu.xin16@xxxxxxxxxx> > Reviewed-by: Yang Yang <yang.yang29@xxxxxxxxxx> > Reviewed-by: Ran Xiaokai <ran.xiaokai@xxxxxxxxxx> > Reviewed-by: wangyong <wang.yong12@xxxxxxxxxx> > Reviewed-by: Yunkai Zhang <zhang.yunkai@xxxxxxxxxx> > Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > --- > v6: > - modify the way of "return" > - remove unnecessary words in Documentation/admin-guide/mm/ksm.rst > - add additional notes to "set 0 to ksm_force" in Documentation/../ksm.rst > and Documentation/../proc.rst > v5: > - fix typos in Documentation/filesystem/proc.rst > v4: > - fix typos in commit log > - add interface descriptions under Documentation/ > v3: > - fix compile error of mm/ksm.c > v2: > - fix a spelling error in commit log. > - remove a redundant condition check in ksm_force_write(). > --- > Documentation/admin-guide/mm/ksm.rst | 19 +++++- > Documentation/filesystems/proc.rst | 17 +++++ > fs/proc/base.c | 93 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 +++ > mm/ksm.c | 32 +++++++++- > 5 files changed, 167 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst > index b244f0202a03..8cabc2504005 100644 > --- a/Documentation/admin-guide/mm/ksm.rst > +++ b/Documentation/admin-guide/mm/ksm.rst > @@ -32,7 +32,7 @@ are swapped back in: ksmd must rediscover their identity and merge again). > Controlling KSM with madvise > ============================ > > -KSM only operates on those areas of address space which an application > +KSM can operates on those areas of address space which an application > has advised to be likely candidates for merging, by using the madvise(2) > system call:: > > @@ -70,6 +70,23 @@ Applications should be considerate in their use of MADV_MERGEABLE, > restricting its use to areas likely to benefit. KSM's scans may use a lot > of processing power: some installations will disable KSM for that reason. > > +Controlling KSM with procfs > +=========================== > + > +KSM can also operate on anonymous areas of address space of those processes's > +knob ``/proc/<pid>/ksm_force`` is on, even if app codes doesn't call madvise() > +explicitly to advise specific areas as MADV_MERGEABLE. > + > +You can set ksm_force to 1 to force all anonymous and qualified VMAs of > +this process to be involved in KSM scanning. > + e.g. ``echo 1 > /proc/<pid>/ksm_force`` > + > +You can also set ksm_force to 0 to cancel that force feature of this process > +and unmerge those merged pages which belongs to those VMAs not marked as > +MADV_MERGEABLE of this process. But that still leave those pages belonging to > +VMAs marked as MADV_MERGEABLE merged (fallback to the default state). > + e.g. ``echo 0 > /proc/<pid>/ksm_force`` > + > .. _ksm_sysfs: > > KSM daemon sysfs interface > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 061744c436d9..8890b8b457a4 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@xxxxxxxxxxx> June 9 2009 > 3.10 /proc/<pid>/timerslack_ns - Task timerslack value > 3.11 /proc/<pid>/patch_state - Livepatch patch operation state > 3.12 /proc/<pid>/arch_status - Task architecture specific information > + 3.13 /proc/<pid>/ksm_force - Setting of mandatory involvement in KSM > > 4 Configuring procfs > 4.1 Mount options > @@ -2176,6 +2177,22 @@ AVX512_elapsed_ms > the task is unlikely an AVX512 user, but depends on the workload and the > scheduling scenario, it also could be a false negative mentioned above. > > +3.13 /proc/<pid>/ksm_force - Setting of mandatory involvement in KSM > +----------------------------------------------------------------------- > +When CONFIG_KSM is enabled, this file can be used to specify if this > +process's anonymous memory can be involved in KSM scanning without app codes > +explicitly calling madvise to mark memory address as MADV_MERGEABLE. > + > +If writing 1 to this file, the kernel will force all anonymous and qualified > +memory to be involved in KSM scanning without explicitly calling madvise to > +mark memory address as MADV_MERGEABLE. But that is effective only when the > +klob of '/sys/kernel/mm/ksm/run' is set as 1. > + > +If writing 0 to this file, the mandatory KSM feature of this process's will > +be cancelled and unmerge those merged pages which belongs to those areas not > +marked as MADV_MERGEABLE of this process, but leave those pages belonging to > +areas marked as MADV_MERGEABLE merged (fallback to the default state). > + > Chapter 4: Configuring procfs > ============================= > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index 8dfa36a99c74..d60f7342f79e 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -96,6 +96,7 @@ > #include <linux/time_namespace.h> > #include <linux/resctrl.h> > #include <linux/cn_proc.h> > +#include <linux/ksm.h> > #include <trace/events/oom.h> > #include "internal.h" > #include "fd.h" > @@ -3168,6 +3169,96 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace * > > return 0; > } > + > +static ssize_t ksm_force_read(struct file *file, char __user *buf, size_t count, > + loff_t *ppos) > +{ > + struct task_struct *task; > + struct mm_struct *mm; > + char buffer[PROC_NUMBUF]; > + ssize_t len; > + int ret; > + > + task = get_proc_task(file_inode(file)); > + if (!task) > + return -ESRCH; > + > + mm = get_task_mm(task); > + ret = 0; > + if (mm) { > + len = snprintf(buffer, sizeof(buffer), "%d\n", mm->ksm_force); > + ret = simple_read_from_buffer(buf, count, ppos, buffer, len); > + mmput(mm); > + } > + > + return ret; > +} > + > +static ssize_t ksm_force_write(struct file *file, const char __user *buf, > + size_t count, loff_t *ppos) > +{ > + struct task_struct *task; > + struct mm_struct *mm; > + char buffer[PROC_NUMBUF]; > + int force; > + int err = 0; > + > + memset(buffer, 0, sizeof(buffer)); > + if (count > sizeof(buffer) - 1) > + count = sizeof(buffer) - 1; > + if (copy_from_user(buffer, buf, count)) > + return -EFAULT; > + > + err = kstrtoint(strstrip(buffer), 0, &force); > + if (err) > + return err; > + > + if (force != 0 && force != 1) > + return -EINVAL; > + > + task = get_proc_task(file_inode(file)); > + if (!task) > + return -ESRCH; > + > + mm = get_task_mm(task); > + if (!mm) > + goto out_put_task; > + > + if (mm->ksm_force != force) { > + if (mmap_write_lock_killable(mm)) { > + err = -EINTR; > + goto out_mmput; > + } > + > + if (force == 0) > + mm->ksm_force = force; > + else { > + /* > + * Force anonymous pages of this mm to be involved in KSM merging > + * without explicitly calling madvise. > + */ > + if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) > + err = __ksm_enter(mm); > + if (!err) > + mm->ksm_force = force; > + } > + > + mmap_write_unlock(mm); > + } > + > +out_mmput: > + mmput(mm); > +out_put_task: > + put_task_struct(task); > + > + return err < 0 ? err : count; > +} > + > +static const struct file_operations proc_pid_ksm_force_operations = { > + .read = ksm_force_read, > + .write = ksm_force_write, > + .llseek = generic_file_llseek, > +}; > #endif /* CONFIG_KSM */ > > #ifdef CONFIG_STACKLEAK_METRICS > @@ -3303,6 +3394,7 @@ static const struct pid_entry tgid_base_stuff[] = { > #endif > #ifdef CONFIG_KSM > ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), > + REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations), > #endif > }; > > @@ -3639,6 +3731,7 @@ static const struct pid_entry tid_base_stuff[] = { > #endif > #ifdef CONFIG_KSM > ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), > + REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations), > #endif > }; > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index b34ff2cdbc4f..1b1592c2f5cf 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -661,6 +661,15 @@ struct mm_struct { > * merging. > */ > unsigned long ksm_merging_pages; > + /* > + * If true, force anonymous pages of this mm to be involved in KSM > + * merging without explicitly calling madvise. It is effctive only > + * when the klob of '/sys/kernel/mm/ksm/run' is set as 1. If false, > + * cancel the feature of ksm_force of this process and unmerge > + * those merged pages which is not madvised as MERGEABLE of this > + * process, but leave MERGEABLE areas merged. > + */ > + bool ksm_force; > #endif > } __randomize_layout; > > diff --git a/mm/ksm.c b/mm/ksm.c > index 38360285497a..c9f672dcc72e 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -334,6 +334,34 @@ static void __init ksm_slab_free(void) > mm_slot_cache = NULL; > } > > +/* Check if vma is qualified for ksmd scanning */ > +static bool ksm_vma_check(struct vm_area_struct *vma) > +{ > + unsigned long vm_flags = vma->vm_flags; > + > + if (!(vma->vm_flags & VM_MERGEABLE) && !(vma->vm_mm->ksm_force)) > + return false; > + > + if (vm_flags & (VM_SHARED | VM_MAYSHARE | > + VM_PFNMAP | VM_IO | VM_DONTEXPAND | > + VM_HUGETLB | VM_MIXEDMAP)) > + return false; /* just ignore this vma*/ > + > + if (vma_is_dax(vma)) > + return false; > + > +#ifdef VM_SAO > + if (vm_flags & VM_SAO) > + return false; > +#endif > +#ifdef VM_SPARC_ADI > + if (vm_flags & VM_SPARC_ADI) > + return false; > +#endif > + > + return true; > +} > + > static __always_inline bool is_stable_node_chain(struct stable_node *chain) > { > return chain->rmap_hlist_len == STABLE_NODE_CHAIN; > @@ -523,7 +551,7 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm, > if (ksm_test_exit(mm)) > return NULL; > vma = vma_lookup(mm, addr); > - if (!vma || !(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) > + if (!vma || !ksm_vma_check(vma) || !vma->anon_vma) > return NULL; > return vma; > } > @@ -2297,7 +2325,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) > vma = find_vma(mm, ksm_scan.address); > > for (; vma; vma = vma->vm_next) { > - if (!(vma->vm_flags & VM_MERGEABLE)) > + if (!ksm_vma_check(vma)) > continue; > if (ksm_scan.address < vma->vm_start) > ksm_scan.address = vma->vm_start; > -- Oleksandr Natalenko (post-factum)