пн, 13 мая 2019 г. в 14:33, Oleksandr Natalenko <oleksandr@xxxxxxxxxx>: > > Hi. > > On Mon, May 13, 2019 at 01:38:43PM +0300, Kirill Tkhai wrote: > > On 10.05.2019 10:21, Oleksandr Natalenko wrote: > > > By default, KSM works only on memory that is marked by madvise(). And the > > > only way to get around that is to either: > > > > > > * use LD_PRELOAD; or > > > * patch the kernel with something like UKSM or PKSM. > > > > > > Instead, lets implement a so-called "always" mode, which allows marking > > > VMAs as mergeable on do_anonymous_page() call automatically. > > > > > > The submission introduces a new sysctl knob as well as kernel cmdline option > > > to control which mode to use. The default mode is to maintain old > > > (madvise-based) behaviour. > > > > > > Due to security concerns, this submission also introduces VM_UNMERGEABLE > > > vmaflag for apps to explicitly opt out of automerging. Because of adding > > > a new vmaflag, the whole work is available for 64-bit architectures only. > > >> This patchset is based on earlier Timofey's submission [1], but it doesn't > > > use dedicated kthread to walk through the list of tasks/VMAs. > > > > > > For my laptop it saves up to 300 MiB of RAM for usual workflow (browser, > > > terminal, player, chats etc). Timofey's submission also mentions > > > containerised workload that benefits from automerging too. > > > > This all approach looks complicated for me, and I'm not sure the shown profit > > for desktop is big enough to introduce contradictory vma flags, boot option > > and advance page fault handler. Also, 32/64bit defines do not look good for > > me. I had tried something like this on my laptop some time ago, and > > the result was bad even in absolute (not in memory percentage) meaning. > > Isn't LD_PRELOAD trick enough to desktop? Your workload is same all the time, > > so you may statically insert correct preload to /etc/profile and replace > > your mmap forever. > > > > Speaking about containers, something like this may have a sense, I think. > > The probability of that several containers have the same pages are higher, > > than that desktop applications have the same pages; also LD_PRELOAD for > > containers is not applicable. > > Yes, I get your point. But the intention is to avoid another hacky trick > (LD_PRELOAD), thus *something* should *preferably* be done on the > kernel level instead. > > > But 1)this could be made for trusted containers only (are there similar > > issues with KSM like with hardware side-channel attacks?!); > > Regarding side-channel attacks, yes, I think so. Were those openssl guys > who complained about it?.. > > > 2) the most > > shared data for containers in my experience is file cache, which is not > > supported by KSM. > > > > There are good results by the link [1], but it's difficult to analyze > > them without knowledge about what happens inside them there. > > > > Some of tests have "VM" prefix. What the reason the hypervisor don't mark > > their VMAs as mergeable? Can't this be fixed in hypervisor? What is the > > generic reason that VMAs are not marked in all the tests? > > Timofey, could you please address this? That's just a describe of machine, only to show difference in deduplication for application in small VM and real big server i.e. KSM enabled in VM for containers, not for hypervisor. > Also, just for the sake of another piece of stats here: > > $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc > 526 IIRC, for calculate saving you must use (pages_shared - pages_sharing) > > In case of there is a fundamental problem of calling madvise, can't we > > just implement an easier workaround like a new write-only file: > > > > #echo $task > /sys/kernel/mm/ksm/force_madvise > > > > which will mark all anon VMAs as mergeable for a passed task's mm? > > > > A small userspace daemon may write mergeable tasks there from time to time. > > > > Then we won't need to introduce additional vm flags and to change > > anon pagefault handler, and the changes will be small and only > > related to mm/ksm.c, and good enough for both 32 and 64 bit machines. > > Yup, looks appealing. Two concerns, though: > > 1) we are falling back to scanning through the list of tasks (I guess > this is what we wanted to avoid, although this time it happens in the > userspace); > > 2) what kinds of opt-out we should maintain? Like, what if force_madvise > is called, but the task doesn't want some VMAs to be merged? This will > required new flag anyway, it seems. And should there be another > write-only file to unmerge everything forcibly for specific task? > > Thanks. > > P.S. Cc'ing Pavel properly this time. > > -- > Best regards, > Oleksandr Natalenko (post-factum) > Senior Software Maintenance Engineer -- Have a nice day, Timofey.