On 04.07.22 10:13, CGEL wrote: > On Fri, Jul 01, 2022 at 02:09:24PM +0200, David Hildenbrand wrote: >> On 01.07.22 14:02, Michal Hocko wrote: >>> On Fri 01-07-22 12:50:59, David Hildenbrand wrote: >>>> On 01.07.22 12:32, David Hildenbrand wrote: >>>>> On 01.07.22 11:11, Michal Hocko wrote: >>>>>> [Cc Jann] >>>>>> >>>>>> On Fri 01-07-22 08:43:23, cgel.zte@xxxxxxxxx wrote: >>>>>>> From: xu xin <xu.xin16@xxxxxxxxxx> >>>>>>> >>>>>>> The benefits of doing this are obvious because using madvise in user code >>>>>>> is the only current way to enable KSM, which is inconvenient for those >>>>>>> compiled app without marking MERGEABLE wanting to enable KSM. >>>>>> >>>>>> I would rephrase: >>>>>> " >>>>>> KSM functionality is currently available only to processes which are >>>>>> using MADV_MERGEABLE directly. This is limiting because there are >>>>>> usecases which will benefit from enabling KSM on a remote process. One >>>>>> example would be an application which cannot be modified (e.g. because >>>>>> it is only distributed as a binary). MORE EXAMPLES WOULD BE REALLY >>>>>> BENEFICIAL. >>>>>> " >>>>>> >>>>>>> Since we already have the syscall of process_madvise(), then reusing the >>>>>>> interface to allow external KSM hints is more acceptable [1]. >>>>>>> >>>>>>> Although this patch was released by Oleksandr Natalenko, but it was >>>>>>> unfortunately terminated without any conclusions because there was debate >>>>>>> on whether it should use signal_pending() to check the target task besides >>>>>>> the task of current() when calling unmerge_ksm_pages of other task [2]. >>>>>> >>>>>> I am not sure this is particularly interesting. I do not remember >>>>>> details of that discussion but checking signal_pending on a different >>>>>> task is rarely the right thing to do. In this case the check is meant to >>>>>> allow bailing out from the operation so that the caller could be >>>>>> terminated for example. >>>>>> >>>>>>> I think it's unneeded to check the target task. For example, when we set >>>>>>> the klob /sys/kernel/mm/ksm/run from 1 to 2, >>>>>>> unmerge_and_remove_all_rmap_items() doesn't use signal_pending() to check >>>>>>> all other target tasks either. >>>>>>> >>>>>>> I hope this patch can get attention again. >>>>>> >>>>>> One thing that the changelog is missing and it is quite important IMHO >>>>>> is the permission model. As we have discussed in previous incarnations >>>>>> of the remote KSM functionality that KSM has some security implications. >>>>>> It would be really great to refer to that in the changelog for the >>>>>> future reference (http://lkml.kernel.org/r/CAG48ez0riS60zcA9CC9rUDV=kLS0326Rr23OKv1_RHaTkOOj7A@xxxxxxxxxxxxxx) >>>>>> >>>>>> So this implementation requires PTRACE_MODE_READ_FSCREDS and >>>>>> CAP_SYS_NICE so the remote process would need to be allowed to >>>>>> introspect the address space. This is the same constrain applied to the >>>>>> remote momory reclaim. Is this sufficient? >>>>>> >>>>>> I would say yes because to some degree KSM mergning can have very >>>>>> similar effect to memory reclaim from the side channel POV. But it >>>>>> should be really documented in the changelog so that it is clear that >>>>>> this has been a deliberate decision and thought through. >>>>>> >>>>>> Other than that this looks like the most reasonable approach to me. >>>>>> >>>>>>> [1] https://lore.kernel.org/lkml/YoOrdh85+AqJH8w1@xxxxxxxxxxxxxx/ >>>>>>> [2] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@xxxxxxx/ >>>>>>> >>>>> >>>>> I have various concerns, but the biggest concern is that this modifies >>>>> VMA flags and can possibly break applications. >>>>> >>>>> process_madvise must not modify remote process state. >>>>> >>>>> That's why we only allow a very limited selection that are merely hints. >>>>> >>>>> So nack from my side. >>>>> >>>> >>>> [I'm quit ebusy, but I think some more explanation might be of value] >>>> >>>> One COW example where I think force-enabling KSM for processes is >>>> *currently* not a good idea (besides the side channel discussions, which >>>> is also why Windows stopped to enable KSM system wide a while ago): >>>> >>>> App: >>>> >>>> a) memset(page, 0); >>>> b) trigger R/O long-term pin on page (e.g., vfio) >>>> >>>> If between a) and b) KSM replaces the page by the shared zeropage you'll >>>> get an unreliable pin because we don't break yet COW when taking a R/O >>>> pin on the shared zeropage. And in the traditional sense, the app did >>>> everything right to guarantee that the pin will stay reliable. >>> >>> Isn't this a bug in the existing implementation of the CoW? >> >> One the one hand yes (pinning the shared zeropage is questionable), on >> the other hand no (user space did modify that memory ahead of time and >> filled it with something reasonable, that's how why always worked >> correctly in the absence of KSM). >> > > Thanks for your information. > > So does it needs to be fixed? and if yes, are you planning to fix it. Very high on my todo list. So yes, I think it really needs fixing, especially with KSM in mind. -- Thanks, David / dhildenb