* Paul Mackerras <paulus@xxxxxxxxx> wrote: > Ingo Molnar writes: > > > * tip-bot for Paul Mackerras <paulus@xxxxxxxxx> wrote: > > > > > @@ -885,6 +934,16 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu) > > > > > > regs = task_pt_regs(task); > > > perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0); > > > + > > > + next_ctx = next->perf_counter_ctxp; > > > + if (next_ctx && context_equiv(ctx, next_ctx)) { > > > + task->perf_counter_ctxp = next_ctx; > > > + next->perf_counter_ctxp = ctx; > > > + ctx->task = next; > > > + next_ctx->task = task; > > > + return; > > > + } > > > > there's one complication that this trick is causing - the migration > > counter relies on ctx->task to get per task migration stats: > > > > static inline u64 get_cpu_migrations(struct perf_counter *counter) > > { > > struct task_struct *curr = counter->ctx->task; > > > > if (curr) > > return curr->se.nr_migrations; > > return cpu_nr_migrations(smp_processor_id()); > > } > > > > as ctx->task is now jumping (while we keep the context), the > > migration stats are out of whack. > > How did you notice this? The overall sum over all children should > still be correct, though some individual children's counters could > go negative, so the result of a read on the counter when some > children have exited and others haven't could look a bit strange. > Reading the counter after all children have exited should be fine, > though. i've noticed a few weirdnesses and then added a debug check and noticed the negative delta values. > One of the effects of optimizing the context switch is that in > general, reading the value of an inheritable counter when some > children have exited but some are still running might produce > results that include some of the activity of the still-running > children and might not include all of the activity of the children > that have exited. If that's a concern then we need to implement > the "sync child counters" ioctl that has been suggested. > > As for the migration counter, it is the only software counter that > is still using the "old" approach, i.e. it doesn't generate > interrupts and it uses the counter->prev_state field (which I hope > to eliminate one day). It's also the only software counter which > counts events that happen while the task is not scheduled in. The > cleanest thing would be to rewrite the migration counter code to > have a callin from the scheduler when migrations happen. I'll check with the debug check removed again. If the end result is OK then i dont think we need to worry much about this, at this stage. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html