On 7 Oct 2022 20:04:51 -0500 Youssef Esmat <youssefesmat@xxxxxxxxxxxx> > Hi Vincent, > > On Sun, Sep 25, 2022 at 9:39 AM Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote: > > > > Add a rb tree for latency sensitive entities so we can schedule the most > > sensitive one first even when it failed to preempt current at wakeup or > > when it got quickly preempted by another entity of higher priority. > > > > In order to keep fairness, the latency is used once at wakeup to get a > > minimum slice and not during the following scheduling slice to prevent > > long running entity to got more running time than allocated to his nice > > priority. > > > > The rb tree nebales to cover the last corner case where latency > > sensitive entity can't got schedule quickly after the wakeup. > > > > hackbench -l 10000 -g $group & > > cyclictest --policy other -D 5 -q -n > > latency 0 latency -20 > > group min avg max min avg max > > 0 17 19 29 17 18 30 > > 1 65 306 7149 64 83 208 > > 4 50 395 15731 56 80 271 > > 8 56 781 41548 54 80 301 > > 16 60 1392 87237 59 86 490 > > > > group = 0 means that hackbench is not running. > > > > Both avg and max are significantly improved with nice latency -20. If we > > add the histogram parameters to get details of latency, we have : > > > > hackbench -l 10000 -g 16 & > > cyclictest --policy other -D 5 -q -n -H 20000 --histfile data.txt > > latency 0 latency -20 > > Min Latencies: 60 61 > > Avg Latencies: 1077 86 > > Max Latencies: 87311 444 > > 50% latencies: 92 85 > > 75% latencies: 554 90 > > 85% latencies: 1019 93 > > 90% latencies: 1346 96 > > 95% latencies: 5400 100 > > 99% latencies: 19044 110 > > > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > > --- > > The ability to boost the latency sensitivity of a task seems very > interesting. I have been playing around with these changes and have > some observations. > > I tried 2 bursty tasks affinitized to the same CPU. The tasks sleep > for 1ms and run for 10ms in a loop. I first tried it without adjusting > the latency_nice value and took perf sched traces: > > latency_test:7040 | 2447.137 ms | 8 | avg: 6.546 ms | > max: 10.674 ms | max start: 353.809487 s | max end: 353.820161 s > latency_test:7028 | 2454.777 ms | 7 | avg: 4.494 ms | > max: 10.609 ms | max start: 354.804386 s | max end: 354.814995 s > > Everything looked as expected, for a 5s run they had similar runtime > and latency. > > I then adjusted one task to have a latency_nice of -20 (pid 8614 > below) and took another set of traces: > > latency_test:8618 | 1845.534 ms | 131 | avg: 9.764 ms | > max: 10.686 ms | max start: 1405.737905 s | max end: 1405.748592 s > latency_test:8614 | 3033.635 ms | 16 | avg: 3.559 ms | > max: 10.467 ms | max start: 1407.594751 s | max end: 1407.605218 s > > The task with -20 latency_nice had significantly more runtime. The > average latency was improved but the max roughly stayed the same. As > expected the one with latency_nice value of 0 experienced more > switches, but so did the one with latency_nice of -20. Hey Youssef See if revert works again in this case in terms of fixing regression. Hillf +++ b/kernel/sched/fair.c @@ -4571,7 +4571,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq { unsigned long ideal_runtime, delta_exec; struct sched_entity *se; - s64 delta; + s64 delta, d2; ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; @@ -4595,12 +4595,12 @@ check_preempt_tick(struct cfs_rq *cfs_rq se = __pick_first_entity(cfs_rq); delta = curr->vruntime - se->vruntime; - delta -= wakeup_latency_gran(curr, se); + d2 = delta - wakeup_latency_gran(curr, se); if (delta < 0) return; - if (delta > ideal_runtime) + if (delta > ideal_runtime || d2 > ideal_runtime) resched_curr(rq_of(cfs_rq)); }