Re: [PATCH v4 2/2] ksm: provide support to use deferrable timers for scanner thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 08, 2014 at 01:25:36AM -0700, Hugh Dickins wrote:
> Well, yes, but... how do we know when there is no more work to do?

Yeah, I figured that out _after_ I send that email..

> Thomas has given reason why KSM might simply fail to do its job if we
> rely on the deferrable timer.  So I've tried another approach, patch
> below; but I do not expect you to jump for joy at the sight of it!

Indeed :/

> I've tried to minimize the offensive KSM hook in context_switch().
> Why place it there, rather than do something near profile_tick() or
> account_process_tick()?  Because KSM is aware of mms not tasks, and
> context_switch() should have the next mm cachelines hot (if not, a
> slight regrouping in mm_struct should do it); whereas I can find
> no reference whatever to mm_struct in kernel/time, so hooking to
> KSM from there would drag in another few cachelines every tick.
> 
> (Another approach would be to set up KSM hint faulting, along the
> lines of NUMA hint faulting.  Not a path I'm keen to go down.)
> 
> I'm not thrilled with this patch, I think it's somewhat defective
> in several ways.  But maybe in practice it will prove good enough,
> and if so then I'd rather not waste effort on complicating it.
> 
> My own testing is not realistic, nor representative of real KSM users;
> and I have no idea what values of pages_to_scan and sleep_millisecs
> people really use (and those may make quite a difference to how
> well it works).
> 
> Chintan, even if the scheduler guys turn out to hate it, please would
> you give the patch below a try, to see how well it works in your
> environment, whether it seems to go better or worse than your own patch.
> 
> If it works well enough for you, maybe we can come up with ideas to
> make it more palatable.  I do think your issue is an important one
> to fix, one way or another.
> 
> Thanks,
> Hugh
> 
> [PATCH] ksm: avoid periodic wakeup while mergeable mms are quiet
> 
> Description yet to be written!
> 
> Reported-by: Chintan Pandya <cpandya@xxxxxxxxxxxxxx>
> Not-Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> ---
> 
>  include/linux/ksm.h   |   14 +++++++++++
>  include/linux/sched.h |    1 
>  kernel/sched/core.c   |    9 ++++++-
>  mm/ksm.c              |   50 ++++++++++++++++++++++++++++------------
>  4 files changed, 58 insertions(+), 16 deletions(-)
> 
> --- 3.17-rc4/include/linux/ksm.h	2014-03-30 20:40:15.000000000 -0700
> +++ linux/include/linux/ksm.h	2014-09-07 11:54:41.528003316 -0700

> @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st
>  {
>  }
>  
> +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm)

s/ksm_switch/__&/

> +{
> +	return NULL;
> +}
> +
>  static inline int PageKsm(struct page *page)
>  {
>  	return 0;

> --- 3.17-rc4/kernel/sched/core.c	2014-08-16 16:00:54.062189063 -0700
> +++ linux/kernel/sched/core.c	2014-09-07 11:54:41.528003316 -0700

> @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas
>  	       struct task_struct *next)
>  {
>  	struct mm_struct *mm, *oldmm;
> +	wait_queue_head_t *wake_ksm = NULL;
>  
>  	prepare_task_switch(rq, prev, next);
>  
> @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas
>  		next->active_mm = oldmm;
>  		atomic_inc(&oldmm->mm_count);
>  		enter_lazy_tlb(oldmm, next);
> -	} else
> +	} else {
>  		switch_mm(oldmm, mm, next);
> +		wake_ksm = ksm_switch(mm);

Is this the right mm? We've just switched the stack, so we're looking at
next->mm when we switched away from current. That might not exist
anymore.

> +	}
>  
>  	if (!prev->mm) {
>  		prev->active_mm = NULL;
> @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas
>  	 * frame will be invalid.
>  	 */
>  	finish_task_switch(this_rq(), prev);
> +
> +	if (wake_ksm)
> +		wake_up_interruptible(wake_ksm);
>  }

Quite horrible for sure. I really hate seeing KSM cruft all the way down
here. Can't we create a new (timer) infrastructure that does the right
thing? Surely this isn't the only such case.

I know both RCU and some NOHZ_FULL muck already track when the system is
completely idle. This is yet another case of that.

Attachment: pgpHxwMDyzyxC.pgp
Description: PGP signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]