Hi Morten and Vincent, > On Apr 22, 2019, at 6:22 PM, Song Liu <songliubraving@xxxxxx> wrote: > > Hi Vincent, > >> On Apr 17, 2019, at 5:56 AM, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote: >> >> On Wed, 10 Apr 2019 at 21:43, Song Liu <songliubraving@xxxxxx> wrote: >>> >>> Hi Morten, >>> >>>> On Apr 10, 2019, at 4:59 AM, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote: >>>> >> >>>> >>>> The bit that isn't clear to me, is _why_ adding idle cycles helps your >>>> workload. I'm not convinced that adding headroom gives any latency >>>> improvements beyond watering down the impact of your side jobs. AFAIK, >>> >>> We think the latency improvements actually come from watering down the >>> impact of side jobs. It is not just statistically improving average >>> latency numbers, but also reduces resource contention caused by the side >>> workload. I don't know whether it is from reducing contention of ALUs, >>> memory bandwidth, CPU caches, or something else, but we saw reduced >>> latencies when headroom is used. >>> >>>> the throttling mechanism effectively removes the throttled tasks from >>>> the schedule according to a specific duty cycle. When the side job is >>>> not throttled the main workload is experiencing the same latency issues >>>> as before, but by dynamically tuning the side job throttling you can >>>> achieve a better average latency. Am I missing something? >>>> >>>> Have you looked at your distribution of main job latency and tried to >>>> compare with when throttling is active/not active? >>> >>> cfs_bandwidth adjusts allowed runtime for each task_group each period >>> (configurable, 100ms by default). cpu.headroom logic applies gentle >>> throttling, so that the side workload gets some runtime in every period. >>> Therefore, if we look at time window equal to or bigger than 100ms, we >>> don't really see "throttling active time" vs. "throttling inactive time". >>> >>>> >>>> I'm wondering if the headroom solution is really the right solution for >>>> your use-case or if what you are really after is something which is >>>> lower priority than just setting the weight to 1. Something that >>> >>> The experiments show that, cpu.weight does proper work for priority: the >>> main workload gets priority to use the CPU; while the side workload only >>> fill the idle CPU. However, this is not sufficient, as the side workload >>> creates big enough contention to impact the main workload. >>> >>>> (nearly) always gets pre-empted by your main job (SCHED_BATCH and >>>> SCHED_IDLE might not be enough). If your main job consist >>>> of lots of relatively short wake-ups things like the min_granularity >>>> could have significant latency impact. >>> >>> cpu.headroom gives benefits in addition to optimizations in pre-empt >>> side. By maintaining some idle time, fewer pre-empt actions are >>> necessary, thus the main workload will get better latency. >> >> I agree with Morten's proposal, SCHED_IDLE should help your latency >> problem because side job will be directly preempted unlike normal cfs >> task even lowest priority. >> In addition to min_granularity, sched_period also has an impact on the >> time that a task has to wait before preempting the running task. Also, >> some sched_feature like GENTLE_FAIR_SLEEPERS can also impact the >> latency of a task. >> >> It would be nice to know if the latency problem comes from contention >> on cache resources or if it's mainly because you main load waits >> before running on a CPU >> >> Regards, >> Vincent > > Thanks for these suggestions. Here are some more tests to show the impact > of scheduler knobs and cpu.headroom. > > side-load | cpu.headroom | side/cpu.weight | min_gran | cpu-idle | main/latency > -------------------------------------------------------------------------------- > none | 0 | n/a | 1 ms | 45.20% | 1.00 > ffmpeg | 0 | 1 | 10 ms | 3.38% | 1.46 > ffmpeg | 0 | SCHED_IDLE | 1 ms | 5.69% | 1.42 > ffmpeg | 20% | SCHED_IDLE | 1 ms | 19.00% | 1.13 > ffmpeg | 30% | SCHED_IDLE | 1 ms | 27.60% | 1.08 > > In all these cases, the main workload is loaded with same level of > traffic (request per second). Main workload latency numbers are normalized > based on the baseline (first row). > > For the baseline, the main workload runs without any side workload, the > system has about 45.20% idle CPU. > > The next two rows compare the impact of scheduling knobs cpu.weight and > sched_min_granularity. With cpu.weight of 1 and min_granularity of 10ms, > we see a latency of 1.46; with SCHED_IDLE and min_granularity of 1ms, we > see a latency of 1.42. So SCHED_IDLE and min_granularity help protecting > the main workload. However, it is not sufficient, as the latency overhead > is high (>40%). > > The last two rows show the benefit of cpu.headroom. With 20% headroom, > the latency is 1.13; while with 30% headroom, the latency is 1.08. > > We can also see a clear correlation between latency and global idle CPU: > more idle CPU yields better lower latency. > > Over all, these results show that cpu.headroom provides effective > mechanism to control the latency impact of side workloads. Other knobs > could also help the latency, but they are not as effective and flexible > as cpu.headroom. > > Does this analysis address your concern? > > Thanks, > Song > Could you please share your comments and suggestions on this work? Did the results address your questions/concerns? Thanks again, Song >> >>> >>> Thanks, >>> Song >>> >>>> >>>> Morten