Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller

Song Liu <songliubraving@xxxxxx> · Mon, 15 Apr 2019 16:48:49 +0000

Hi Peter,

> On Apr 8, 2019, at 2:45 PM, Song Liu <songliubraving@xxxxxx> wrote:
> 
> Servers running latency sensitive workload usually aren't fully loaded for 
> various reasons including disaster readiness. The machines running our 
> interactive workloads (referred as main workload) have a lot of spare CPU 
> cycles that we would like to use for optimistic side jobs like video 
> encoding. However, our experiments show that the side workload has strong
> impact on the latency of main workload:
> 
>  side-job   main-load-level   main-avg-latency
>     none          1.0              1.00
>     none          1.1              1.10
>     none          1.2              1.10 
>     none          1.3              1.10
>     none          1.4              1.15
>     none          1.5              1.24
>     none          1.6              1.74
> 
>     ffmpeg        1.0              1.82
>     ffmpeg        1.1              2.74
> 
> Note: both the main-load-level and the main-avg-latency numbers are
> _normalized_.
> 
> In these experiments, ffmpeg is put in a cgroup with cpu.weight of 1 
> (lowest priority). However, it consumes all idle CPU cycles in the 
> system and causes high latency for the main workload. Further experiments
> and analysis (more details below) shows that, for the main workload to meet
> its latency targets, it is necessary to limit the CPU usage of the side
> workload so that there are some _idle_ CPU. There are various reasons
> behind the need of idle CPU time. First, shared CPU resouce saturation 
> starts to happen way before time-measured utilization reaches 100%. 
> Secondly, scheduling latency starts to impact the main workload as CPU 
> reaches full utilization. 
> 
> Currently, the cpu controller provides two mechanisms to protect the main 
> workload: cpu.weight and cpu.max. However, neither of them is sufficient 
> in these use cases. As shown in the experiments above, side workload with 
> cpu.weight of 1 (lowest priority) would still consume all idle CPU and add 
> unacceptable latency to the main workload. cpu.max can throttle the CPU 
> usage of the side workload and preserve some idle CPU. However, cpu.max 
> cannot react to changes in load levels. For example, when the main 
> workload uses 40% of CPU, cpu.max of 30% for the side workload would yield 
> good latencies for the main workload. However, when the workload 
> experiences higher load levels and uses more CPU, the same setting (cpu.max 
> of 30%) would cause the interactive workload to miss its latency target. 
> 
> These experiments demonstrated the need for a mechanism to effectively 
> throttle CPU usage of the side workload and preserve idle CPU cycles. 
> The mechanism should be able to adjust the level of throttling based on
> the load level of the main workload. 
> 
> This patchset introduces a new knob for cpu controller: cpu.headroom. 
> cgroup of the main workload uses cpu.headroom to ensure side workload to 
> use limited CPU cycles. For example, if a main workload has a cpu.headroom 
> of 30%. The side workload will be throttled to give 30% overall idle CPU. 
> If the main workload uses more than 70% of CPU, the side workload will only 
> run with configurable minimal cycles. This configurable minimal cycles is
> referred as "tolerance" of the main workload. 
> 
> The following is a detailed example:
> 
> main/cpu.headroom    main-cpu-load    low-pri-cpu-cycle   idle-cpu
>      30%                 30%                40%              30%
>      30%                 40%                30%              30%
>      30%                 50%                20%              30%
>      30%                 60%                10%              30%
>      30%                 70%                minimal          ~30%
>      30%                 80%                minimal          ~20%
> 
> In the example, we use a constant cpu.headroom setting of 30%. As main job
> experiences different level of load, the cpu controller adjusts CPU cycles
> used by the low-pri jobs.
> 
> We experiemented with a web server as the main workload and ffmpeg as the 
> side workload. The following table compares latency impact on the main 
> workload under different cpu.headroom settings and load levels. In all 
> tests, the side workload cgroup is configured with cpu.weight of 1. When 
> throttled, the side workload can only run 1ms per 100ms period.
> 
>                               average-latency
> main-load-level   w/o-side    w/-side-      w/-side-       w/-side-
>                            no-headroom   30%-headroom   20%-headroom
>     1.0            1.00       1.82          1.26           1.14                      
>     1.1            1.10       2.74          1.26           1.32                      
>     1.2            1.10                     1.29           1.38                      
>     1.3            1.10                     1.32           1.49                      
>     1.4            1.15                     1.29           1.85                      
>     1.5            1.24                     1.32                                
>     1.6            1.74                     1.50                              
> 
> Each row of the table shows a normalized load level and average latencies 
> for 4 scenarios: w/o side workload, w/ side workload but no headroom; w/ 
> side workload and 30% headroom; with side workload and 20% headroom. 
> 
> 
> When there is no side workload, average latency of main job falls in the 
> 0.7x range, except the very high load scenarios. When there is side 
> workload but no headroom, latency of the main job goes very high at 
> moderate load levels. With 30% headroom, the average latency falls in the 
> 0.8x range. With 20% headroom, the average latency falls in the 0.9x to 
> 1.x range. We didn't finish tests in some cases with high load, because 
> the latency is too high. 
> 
> This experiment demonstrated cpu.headroom is an effective and efficient
> knob to control the latency of the main job.
> 
> Thanks!

Could you please kindly share your feedback and comments on this work?

Thanks and Regards,
Song

> Song Liu (7):
>  sched: refactor tg_set_cfs_bandwidth()
>  cgroup: introduce hook css_has_tasks_changed
>  cgroup: introduce cgroup_parse_percentage
>  sched, cgroup: add entry cpu.headroom
>  sched/fair: global idleness counter for cpu.headroom
>  sched/fair: throttle task runtime based on cpu.headroom
>  Documentation: cgroup-v2: add information for cpu.headroom
> 
> Documentation/admin-guide/cgroup-v2.rst |  18 +
> fs/proc/stat.c                          |   4 +-
> include/linux/cgroup-defs.h             |   2 +
> include/linux/cgroup.h                  |   1 +
> include/linux/kernel_stat.h             |   2 +
> kernel/cgroup/cgroup.c                  |  51 +++
> kernel/sched/core.c                     | 425 ++++++++++++++++++++++--
> kernel/sched/fair.c                     | 143 +++++++-
> kernel/sched/sched.h                    |  30 ++
> 9 files changed, 634 insertions(+), 42 deletions(-)
> 
> -- 
> 2.17.1
>