* Saravana Kannan (skannan@xxxxxxxxxxxxxx) wrote: [...] > > Seems a bit more complicated than what I had in mind. This is touching > the scheduler I think we can get away without having to. Also, there is > no simple implementation for the "slowpath" that can guarantee the delay > without starting over the loop and hoping not to get interrupted or just > giving up and doing a massively inaccurate delay (like msleep, etc). Not necessarily. Another way to do it: we could keep the udelay loop counter in the task struct. When ondemand changes frequency, and upon migration, this counter would be adapted to the current cpu frequency. > > I was thinking of something along the lines of this: > > udelay() > { > if (!is_atomic()) see hardirq.h: /* * Are we running in atomic context? WARNING: this macro cannot * always detect atomic context; in particular, it cannot know about * held spinlocks in non-preemptible kernels. Thus it should not be * used in the general case to determine whether sleeping is possible. * Do not use in_atomic() in driver code. */ #define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE) Sorry, your scheme is broken on !PREEMPT kernels. > down_read(&freq_sem); > /* else > do nothing since cpufreq can't interrupt you. > */ This comment seems broken. in_atomic() can return true because preemption is disabled, thus letting cpufreq interrupts coming in. > > call usual code since cpufreq is not going to preempt you. > > if (!is_atomic()) > up_read(&freq_sem); > } > > __cpufreq_driver_target(...) > { > down_write(&freq_sem); > cpufreq_driver->target(...); > up_write(&freq_sem); > } > > In the implementation of the cpufreq driver, they just need to make sure > they always increase the LPJ _before_ increasing the freq and decrease > the LPJ _after_ decreasing the freq. This is make sure that when an > interrupt handler preempts the cpufreq driver code (since atomic > contexts aren't looking at the r/w semaphore) the LPJ value will be good > enough to satisfy the _at least_ guarantee of udelay(). > > For the CPU switching issue, I think the solution I proposed is quite > simple and should work. You mean this ? >>>> udelay(us) >>>> { >>>> set cpu affinity to current CPU; >>>> Do the usual udelay code; >>>> restore cpu affinity status; >>>> } Things like lock scalability and performance degradations comes to my mind. We can expect some drivers to make very heavy use of udelay(). This should not bring a 4096-core box to its knees. sched_setaffinity() is very far from being lightweight, as it locks cpu hotplug (that's a global mutex protecting a refcount), allocates memory, manipulates cpumasks, etc... > > Does my better explained solution look palatable? Nope, not on a multiprocessor system. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html