On Thu 12-03-20 13:16:02, Minchan Kim wrote: > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote: [...] > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@xxxxxxxx> > > Date: Thu, 12 Mar 2020 09:04:35 +0100 > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages > > > > Jann has brought up a very interesting point [1]. While shared pages are > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed > > that way. This can lead to all sorts of hard to debug problems. E.g. > > performance problems outlined by Daniel [2]. There are runtime > > environments where there is a substantial memory shared among security > > domains via CoW memory and a easy to reclaim way of that memory, which > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation > > in for the parent process which might be more privileged or even open > > side channel attacks. The feasibility of the later is not really clear > > I am not sure it's a good idea to mention performance stuff because > it's rather arguble. You and Johannes already pointed it out when I sbumit > early draft which had shared page filtering out logic due to performance > reason. You guys suggested the shared pages has higher chance to be touched > so that if it's really hot pages, that whould keep in the memory. I agree. Yes, the hot memory is likely to be referenced but the point was an unexpected latency because of the major fault. I have to say that I have underestimated the issue because I was not aware of runtimes mentioned in the referenced links. Essentially a lot of CoW memory shared over security domains. > I think the only reason at this moment is just vulnerability. > > > to me TBH but there is no real reason for exposure at this stage. It > > seems there is no real use case to depend on reclaiming CoW memory via > > madvise at this stage so it is much easier to simply disallow it and > > this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can > > operate only on the exclusively owned memory which is a straightforward > > semantic. > > > > [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@xxxxxxxxxxxxxx > > [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@xxxxxxxxxxxxxx > > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > > --- > > mm/madvise.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 43b47d3fae02..4bb30ed6c8d2 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -335,12 +335,14 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > } > > > > page = pmd_page(orig_pmd); > > + > > + /* Do not interfere with other mappings of this page */ > > > How about this? > /* > * paging out only single mapped private pages for anonymous mapping, > * otherwise, it opens a side channel. > */ I am not sure this is much more helpful without a larger context. I would stick with the wording unless you insist. > Otherwise, looks good to me. Thanks for the review. -- Michal Hocko SUSE Labs