On Thu, Jun 2, 2011 at 11:48 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote: > On Thu, 2011-06-02 at 22:23 +0800, Yong Zhang wrote: >> On Thu, Jun 02, 2011 at 03:04:26PM +0200, Peter Zijlstra wrote: >> > On Thu, 2011-06-02 at 15:52 +0800, Yong Zhang wrote: >> > > In sched_clock_local(), clock is calculated around ->tick_gtod even if >> > > that ->tick_gtod is stale for long time because we stays in idle state. >> > > You know ->tick_gtod is only updated in sched_clock_tick(); >> > >> > (well, no, there's idle callbacks as you said below) >> > >> > > IOW, when a cpu goes out of idle, sched_clock_tick() is called from >> > > tick_nohz_stop_idle() which is later than interrupt. >> > >> > Gah, that would be awefull and mean wakeups from interrupts were already >> > borken. /me goes look at code. >> > >> > irq_enter() -> tick_check_idle() -> tick_check_nohz() -> >> > tick_nohz_stop_idle() -> sched_clock_idle_wakeup_event() >> > >> > should update the thing before we run any isrs, right? >> >> Hmmm, you are right. >> >> But smp_reschedule_interrupt() doesn't call irq_enter()/irq_exit(), >> is that correct? > > Crap.. you're right. > And I bet other archs don't do that either. Most of them ;) I only notice sparc32 do that. Maybe there have more, but I didn't check it very carefully. > With > NO_HZ you really need irq_enter() for pretty much all interrupts so I > was assuming the resched IPI had it, but its been special and never > really needed it. If it would wake an idle cpu the idle loop exit would > deal with it, if it interrupted userspace the thing was running and > NO_HZ wasn't relevant. > > Damn. > > And yes, the only reason I didn't see this on my dev box was because we > do indeed set that sched_clock_stable thing on wsm. And I never noticed > on my desktop because firefox/X/etc. consuming heaps of CPU isn't weird > at all. > > Adding it to all resched int handlers is of course a possibility but > would slow down the thing, although with the new code, most users are > now indeed wakeups (excepting weird and wonderful users like KVM). > > We could of course add it in sched.c since the logic recurses just > fine.. its not pretty though.. :/ Yeah, IMHO it's suitable here and my test looks good. Reviewed-and-Tested-by: Yong Zhang <yong.zhang0@xxxxxxxxx> BTW, sched_ipi() and sched_ttwu_pending() could share a piece of code now. And we place irq_enter()/irq_exit() in sched_ipi() because it's the only function we could call, thus account_system_vtime() could get the almost exact time value. IOW we should pay some attention on the future change of smp_reschedule_interrupt(). Thanks, Yong > > Thoughts? > > --- > Âkernel/sched.c | Â 18 +++++++++++++++++- > Â1 files changed, 17 insertions(+), 1 deletions(-) > > diff --git a/kernel/sched.c b/kernel/sched.c > index 2fe98ed..365ed6b 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -2554,7 +2554,23 @@ static void sched_ttwu_pending(void) > > Âvoid scheduler_ipi(void) > Â{ > - Â Â Â sched_ttwu_pending(); > + Â Â Â struct rq *rq = this_rq(); > + Â Â Â struct task_struct *list = xchg(&rq->wake_list, NULL); > + > + Â Â Â if (!list) > + Â Â Â Â Â Â Â return; > + > + Â Â Â irq_enter(); > + Â Â Â raw_spin_lock(&rq->lock); > + > + Â Â Â while (list) { > + Â Â Â Â Â Â Â struct task_struct *p = list; > + Â Â Â Â Â Â Â list = list->wake_entry; > + Â Â Â Â Â Â Â ttwu_do_activate(rq, p, 0); > + Â Â Â } > + > + Â Â Â raw_spin_unlock(&rq->lock); > + Â Â Â irq_exit(); > Â} > > Âstatic void ttwu_queue_remote(struct task_struct *p, int cpu) > > > -- Only stand for myself -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html