On Mon, Sep 08, 2014 at 01:25:36AM -0700, Hugh Dickins wrote: > Well, yes, but... how do we know when there is no more work to do? Yeah, I figured that out _after_ I send that email.. > Thomas has given reason why KSM might simply fail to do its job if we > rely on the deferrable timer. So I've tried another approach, patch > below; but I do not expect you to jump for joy at the sight of it! Indeed :/ > I've tried to minimize the offensive KSM hook in context_switch(). > Why place it there, rather than do something near profile_tick() or > account_process_tick()? Because KSM is aware of mms not tasks, and > context_switch() should have the next mm cachelines hot (if not, a > slight regrouping in mm_struct should do it); whereas I can find > no reference whatever to mm_struct in kernel/time, so hooking to > KSM from there would drag in another few cachelines every tick. > > (Another approach would be to set up KSM hint faulting, along the > lines of NUMA hint faulting. Not a path I'm keen to go down.) > > I'm not thrilled with this patch, I think it's somewhat defective > in several ways. But maybe in practice it will prove good enough, > and if so then I'd rather not waste effort on complicating it. > > My own testing is not realistic, nor representative of real KSM users; > and I have no idea what values of pages_to_scan and sleep_millisecs > people really use (and those may make quite a difference to how > well it works). > > Chintan, even if the scheduler guys turn out to hate it, please would > you give the patch below a try, to see how well it works in your > environment, whether it seems to go better or worse than your own patch. > > If it works well enough for you, maybe we can come up with ideas to > make it more palatable. I do think your issue is an important one > to fix, one way or another. > > Thanks, > Hugh > > [PATCH] ksm: avoid periodic wakeup while mergeable mms are quiet > > Description yet to be written! > > Reported-by: Chintan Pandya <cpandya@xxxxxxxxxxxxxx> > Not-Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > --- > > include/linux/ksm.h | 14 +++++++++++ > include/linux/sched.h | 1 > kernel/sched/core.c | 9 ++++++- > mm/ksm.c | 50 ++++++++++++++++++++++++++++------------ > 4 files changed, 58 insertions(+), 16 deletions(-) > > --- 3.17-rc4/include/linux/ksm.h 2014-03-30 20:40:15.000000000 -0700 > +++ linux/include/linux/ksm.h 2014-09-07 11:54:41.528003316 -0700 > @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st > { > } > > +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm) s/ksm_switch/__&/ > +{ > + return NULL; > +} > + > static inline int PageKsm(struct page *page) > { > return 0; > --- 3.17-rc4/kernel/sched/core.c 2014-08-16 16:00:54.062189063 -0700 > +++ linux/kernel/sched/core.c 2014-09-07 11:54:41.528003316 -0700 > @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas > struct task_struct *next) > { > struct mm_struct *mm, *oldmm; > + wait_queue_head_t *wake_ksm = NULL; > > prepare_task_switch(rq, prev, next); > > @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas > next->active_mm = oldmm; > atomic_inc(&oldmm->mm_count); > enter_lazy_tlb(oldmm, next); > - } else > + } else { > switch_mm(oldmm, mm, next); > + wake_ksm = ksm_switch(mm); Is this the right mm? We've just switched the stack, so we're looking at next->mm when we switched away from current. That might not exist anymore. > + } > > if (!prev->mm) { > prev->active_mm = NULL; > @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas > * frame will be invalid. > */ > finish_task_switch(this_rq(), prev); > + > + if (wake_ksm) > + wake_up_interruptible(wake_ksm); > } Quite horrible for sure. I really hate seeing KSM cruft all the way down here. Can't we create a new (timer) infrastructure that does the right thing? Surely this isn't the only such case. I know both RCU and some NOHZ_FULL muck already track when the system is completely idle. This is yet another case of that.
Attachment:
pgpu9TpHqFJSL.pgp
Description: PGP signature