Nick Piggin wrote: > On Wednesday 27 August 2008 21:41, Gregory Haskins wrote: > >> Nick Piggin wrote: >> >>> On Tuesday 26 August 2008 22:23, Gregory Haskins wrote: >>> >>>> Nick Piggin wrote: >>>> >>>>> On Tuesday 26 August 2008 06:15, Gregory Haskins wrote: >>>>> >>>>>> double_lock balance() currently favors logically lower cpus since they >>>>>> often do not have to release their own lock to acquire a second lock. >>>>>> The result is that logically higher cpus can get starved when there is >>>>>> a lot of pressure on the RQs. This can result in higher latencies on >>>>>> higher cpu-ids. >>>>>> >>>>>> This patch makes the algorithm more fair by forcing all paths to have >>>>>> to release both locks before acquiring them again. Since callsites to >>>>>> double_lock_balance already consider it a potential >>>>>> preemption/reschedule point, they have the proper logic to recheck for >>>>>> atomicity violations. >>>>>> >>>>>> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx> >>>>>> --- >>>>>> >>>>>> kernel/sched.c | 17 +++++------------ >>>>>> 1 files changed, 5 insertions(+), 12 deletions(-) >>>>>> >>>>>> diff --git a/kernel/sched.c b/kernel/sched.c >>>>>> index 6e0bde6..b7326cd 100644 >>>>>> --- a/kernel/sched.c >>>>>> +++ b/kernel/sched.c >>>>>> @@ -2790,23 +2790,16 @@ static int double_lock_balance(struct rq >>>>>> *this_rq, struct rq *busiest) __acquires(busiest->lock) >>>>>> __acquires(this_rq->lock) >>>>>> { >>>>>> - int ret = 0; >>>>>> - >>>>>> if (unlikely(!irqs_disabled())) { >>>>>> /* printk() doesn't work good under rq->lock */ >>>>>> spin_unlock(&this_rq->lock); >>>>>> BUG_ON(1); >>>>>> } >>>>>> - if (unlikely(!spin_trylock(&busiest->lock))) { >>>>>> - if (busiest < this_rq) { >>>>>> - spin_unlock(&this_rq->lock); >>>>>> - spin_lock(&busiest->lock); >>>>>> - spin_lock_nested(&this_rq->lock, SINGLE_DEPTH_NESTING); >>>>>> - ret = 1; >>>>>> - } else >>>>>> - spin_lock_nested(&busiest->lock, SINGLE_DEPTH_NESTING); >>>>>> - } >>>>>> - return ret; >>>>>> + >>>>>> + spin_unlock(&this_rq->lock); >>>>>> + double_rq_lock(this_rq, busiest); >>>>>> >>>>> Rather than adding the extra atomic operation, can't you just put this >>>>> into the unlikely spin_trylock failure path rather than the unfair >>>>> logic there? >>>>> >>>> The trick is that we *must* first release this_rq before proceeding or >>>> the new proposal doesn't work as intended. This patch effectively >>>> breaks up the this_rq->lock critical section evenly across all CPUs as >>>> if it hit the case common for higher cpus. >>>> >>> I don't exactly see why my proposal would introduce any more latency, >>> because we only trylock while holding the existing lock -- this is will >>> only ever add a small ~constant time to the critical section, regardless >>> of whether it is a high or low CPU runqueue. >>> >> Its because we are trying to create a break in the critical section of >> this_rq->lock, not improve the acquisition of busiest->lock. So whether >> you do spin_lock or spin_trylock on busiest does not matter. Busiest >> will not be contended in the case that I am concerned with. If you use >> my example below: rq[N] will not be contended because cpuN is blocked on >> rq[0] after already having released rq[N]. So its the contention >> against this_rq that is the problem. >> >> Or am I missing your point completely? >> > > Well my point is just that after my change, there may be some windows > of "unfair" behaviour where one CPU gets to go ahead early, but AFAIKS > now there is no systemic bias against higher numbered CPUs (except the > small issue of taking lowered numbered locks first which is also present > in any solution). > > As such, I would actually like to see my proposal implemented in the > !lowlatency case as well... unless my reasoning has a hole in it? > > But if you are _also_ wanting to eliminate the long lock hold time and > strictly improve fairness for lowlatency kernels, then I agree that your > patch goes much further than mine, so I don't object to putting that > under CONFIG_PREEMPT. > Ok, I understand what you are saying now, and that makes sense. -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature