On Tue, 2008-08-26 at 13:35 -0400, Gregory Haskins wrote: > double_lock balance() currently favors logically lower cpus since they > often do not have to release their own lock to acquire a second lock. > The result is that logically higher cpus can get starved when there is > a lot of pressure on the RQs. This can result in higher latencies on > higher cpu-ids. > > This patch makes the algorithm more fair by forcing all paths to have > to release both locks before acquiring them again. Since callsites to > double_lock_balance already consider it a potential preemption/reschedule > point, they have the proper logic to recheck for atomicity violations. > > Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx> > --- > > kernel/sched.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- > 1 files changed, 45 insertions(+), 7 deletions(-) > > diff --git a/kernel/sched.c b/kernel/sched.c > index df6b447..850b454 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -2782,21 +2782,43 @@ static void double_rq_unlock(struct rq *rq1, struct rq *rq2) > __release(rq2->lock); > } > > +#ifdef CONFIG_PREEMPT > + > /* > - * double_lock_balance - lock the busiest runqueue, this_rq is locked already. > + * fair double_lock_balance: Safely acquires both rq->locks in a fair > + * way at the expense of forcing extra atomic operations in all > + * invocations. This assures that the double_lock is acquired using the > + * same underlying policy as the spinlock_t on this architecture, which > + * reduces latency compared to the unfair variant below. However, it > + * also adds more overhead and therefore may reduce throughput. > */ > -static int double_lock_balance(struct rq *this_rq, struct rq *busiest) > +static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest) > + __releases(this_rq->lock) > + __acquires(busiest->lock) > + __acquires(this_rq->lock) > +{ > + spin_unlock(&this_rq->lock); > + double_rq_lock(this_rq, busiest); > + > + return 1; > +} Right - so to belabour Nick's point: if (!spin_trylock(&busiest->lock)) { spin_unlock(&this_rq->lock); double_rq_lock(this_rq, busiest); } might unfairly treat someone who is waiting on this_rq if I understand it right? I suppose one could then write it like: if (spin_is_contended(&this_rq->lock) || !spin_trylock(&busiest->lock)) { spin_unlock(&this_rq->lock); double_rq_lock(this_rq, busiest); } But, I'm not sure that's worth the effort at that point.. Anyway - I think all this is utterly defeated on CONFIG_PREEMPT by the spin with IRQs enabled logic in kernel/spinlock.c. Making this an -rt only patch... -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html