Re: [RFC PATCH v2 00/19] mm: process/cgroup ksm support

Stefan Roesch <shr@xxxxxxxxxxxx> · Tue, 21 Feb 2023 09:59:59 -0800

Johannes Weiner <hannes@xxxxxxxxxxx> writes:

> Hi Stefan,
>
> On Fri, Feb 10, 2023 at 01:50:04PM -0800, Stefan Roesch wrote:
>> So far KSM can only be enabled by calling madvise for memory regions. What is
>> required to enable KSM for more workloads is to enable / disable it at the
>> process / cgroup level.
>>
>> Use case:
>> The madvise call is not available in the programming language. An example for
>> this are programs with forked workloads using a garbage collected language without
>> pointers. In such a language madvise cannot be made available.
>>
>> In addition the addresses of objects get moved around as they are garbage
>> collected. KSM sharing needs to be enabled "from the outside" for these type of
>> workloads.
>
> It would be good to expand on the argument that Rik made about the
> interpreter being used for things were there are no merging
> opportunities, and the KSM scanning overhead isn't amortized.
>
> There is a fundamental mismatch in scopes. madvise() is a
> workload-local decision, whereas sizable sharing opportunities may or
> may not exist across multiple workloads. Only a higher-level entity
> like a job scheduler can know for certain whether it's running one or
> more instances of a job. That job scheduler in turn doesn't have the
> necessary knowledge of the workload's internals to make targeted and
> well-timed advise calls with, say, process_madvise().
>
> This also applies to the security concerns brought up in previous
> threads. An individual workload doesn't know what else is running on
> the machine, so it needs to be highly conservative about what it can
> give up for system-wide merging. However, if the system is dedicated
> to running multiple jobs within the same security domain, it's the job
> scheduler that knows that sharing isn't a problem, and even desirable.
>
> So I think this series makes sense, but it would be good to expand a
> bit on the reasoning and address the security aspect in the cover/doc.
>

These are good points Johannes, I'll elaborate on them with the next
version of the patch.

>> Stefan Roesch (19):
>>   mm: add new flag to enable ksm per process
>>   mm: add flag to __ksm_enter
>>   mm: add flag to __ksm_exit call
>>   mm: invoke madvise for all vmas in scan_get_next_rmap_item
>>   mm: support disabling of ksm for a process
>>   mm: add new prctl option to get and set ksm for a process
>
> The implementation looks sound to me as well.
>
> I think it would be a bit easier to review if you folded these ^^^
> patches, the tools patch below, and the prctl selftests, all into one
> single commit. It's one logical change. This way the new flags and
> helper functions can be reviewed against the new users and callsites
> without having to jump back and forth between emails.
>

I'll fold them in the next version.

>>   mm: split off pages_volatile function
>>   mm: expose general_profit metric
>>   docs: document general_profit sysfs knob
>>   mm: calculate ksm process profit metric
>>   mm: add ksm_merge_type() function
>>   mm: expose ksm process profit metric in ksm_stat
>>   mm: expose ksm merge type in ksm_stat
>>   docs: document new procfs ksm knobs
>
> Same with the new knobs/stats and their documentation.
>

I'll fold them in the next version.

> Logical splitting is easier to follow than geographical splitting.
>
> Thanks!