Re: [tip:perfcounters/core] perf_counter: Optimize context switch between identical inherited contexts

Ingo Molnar <mingo@xxxxxxx> · Mon, 25 May 2009 08:54:17 +0200

* Paul Mackerras <paulus@xxxxxxxxx> wrote:

> Ingo Molnar writes:
> 
> > * tip-bot for Paul Mackerras <paulus@xxxxxxxxx> wrote:
> > 
> > > @@ -885,6 +934,16 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu)
> > >  
> > >  	regs = task_pt_regs(task);
> > >  	perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
> > > +
> > > +	next_ctx = next->perf_counter_ctxp;
> > > +	if (next_ctx && context_equiv(ctx, next_ctx)) {
> > > +		task->perf_counter_ctxp = next_ctx;
> > > +		next->perf_counter_ctxp = ctx;
> > > +		ctx->task = next;
> > > +		next_ctx->task = task;
> > > +		return;
> > > +	}
> > 
> > there's one complication that this trick is causing - the migration 
> > counter relies on ctx->task to get per task migration stats:
> > 
> >  static inline u64 get_cpu_migrations(struct perf_counter *counter)
> >  {
> >         struct task_struct *curr = counter->ctx->task;
> > 
> >         if (curr)
> >                 return curr->se.nr_migrations;
> >         return cpu_nr_migrations(smp_processor_id());
> >  }
> > 
> > as ctx->task is now jumping (while we keep the context), the 
> > migration stats are out of whack.
> 
> How did you notice this?  The overall sum over all children should 
> still be correct, though some individual children's counters could 
> go negative, so the result of a read on the counter when some 
> children have exited and others haven't could look a bit strange.  
> Reading the counter after all children have exited should be fine, 
> though.

i've noticed a few weirdnesses and then added a debug check and 
noticed the negative delta values.

> One of the effects of optimizing the context switch is that in 
> general, reading the value of an inheritable counter when some 
> children have exited but some are still running might produce 
> results that include some of the activity of the still-running 
> children and might not include all of the activity of the children 
> that have exited.  If that's a concern then we need to implement 
> the "sync child counters" ioctl that has been suggested.
> 
> As for the migration counter, it is the only software counter that 
> is still using the "old" approach, i.e. it doesn't generate 
> interrupts and it uses the counter->prev_state field (which I hope 
> to eliminate one day).  It's also the only software counter which 
> counts events that happen while the task is not scheduled in.  The 
> cleanest thing would be to rewrite the migration counter code to 
> have a callin from the scheduler when migrations happen.

I'll check with the debug check removed again. If the end result is 
OK then i dont think we need to worry much about this, at this 
stage.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html