Why do we need schedstats ? ========================== schedstats is a useful feature to do thread-level latency analysis. Our usecase as follows, Userspace Code Scope Profiler { user_func_abc(); <---- uprobe_scope_begin() get start schedstats ... user_func_xyz(); <---- uprobe_scope_end() get end schedstats } Then with the result of (end - begin) we can get below latency details in a specific user scope, scope_latency = Wait + Sleep + Blocked [1] + Run (stime + utime) If there's no schedstats we have to trace the heavy sched::sched_switch and do a lot more stuff. [1]. With patch #4 and don't include sum_block_runtime in sum_sleep_runtime Support schedstats for RT sched class ===================================== If we want to use the schedstats facility to trace other sched classes, we should make it independent of fair sched class. The struct sched_statistics is the schedular statistics of a task_struct or a task_group. So we can move it into struct task_struct and struct task_group to achieve the goal. After the patch, schestats are orgnized as follows, struct task_struct { ... struct sched_entity se; struct sched_rt_entity rt; struct sched_dl_entity dl; ... struct sched_statistics stats; ... }; Regarding the task group, schedstats is only supported for fair group sched, and a new struct sched_entity_stats is introduced, suggested by Peter - struct sched_entity_stats { struct sched_entity se; struct sched_statistics stats; } __no_randomize_layout; Then with the se in a task_group, we can easily get the stats. The sched_statistics members may be frequently modified when schedstats is enabled, in order to avoid impacting on random data which may in the same cacheline with them, the struct sched_statistics is defined as cacheline aligned. As this patch changes the core struct of scheduler, so I verified the performance it may impact on the scheduler with 'perf bench sched pipe', suggested by Mel. Below is the result, in which all the values are in usecs/op. Before After kernel.sched_schedstats=0 5.2~5.4 5.2~5.4 kernel.sched_schedstats=1 5.3~5.5 5.3~5.5 [These data is a little difference with the earlier version, that is because my old test machine is destroyed so I have to use a new different test machine.] Almost no impact on the sched performance. The user can get the schedstats information in the same way in fair sched class. For example, fair RT /proc/[pid]/sched /proc/[pid]/sched schedstats is not supported for RT group. The sched:sched_stat_{wait, sleep, iowait, blocked} tracepoints can be used to trace RT tasks as well. Support schedstats for any other sched classes ============================================== After this patchset, it is very easy to extend the schedstats to any other sched classes. The deadline sched class is also supported in this patchset. Changes Since v3: Various code improvement per Peter, - don't support schedstats for rt group - introduce struct sched_entity_stats for fair group - change the position of 'struct sched_statistics stats' - fixes indent issue - change the output format in /proc/[pid]/sched - add the usecase of schedstats - support schedstats for deadline task - and other suggestions Changes Since v2: - Fixes the output format in /proc/[pid]/sched - Rebase it on the latest code - Redo the performance test Changes since v1: - Fix the build failure reported by kernel test robot. - Add the performance data with 'perf bench sched pipe', suggested by Mel. - Make the struct sched_statistics cacheline aligned. - Introduce task block time in schedstats Changes since RFC: - improvement of schedstats helpers, per Mel. - make struct schedstats independent of fair sched class Yafang Shao (8): sched, fair: use __schedstat_set() in set_next_entity() sched: make struct sched_statistics independent of fair sched class sched: make schedstats helpers independent of fair sched class sched: introduce task block time in schedstats sched, rt: support sched_stat_runtime tracepoint for RT sched class sched, rt: support schedstats for RT sched class sched, dl: support sched_stat_runtime tracepoint for deadline sched class sched, dl: support schedstats for deadline sched class include/linux/sched.h | 8 +- kernel/sched/core.c | 25 +++--- kernel/sched/deadline.c | 99 +++++++++++++++++++++- kernel/sched/debug.c | 97 +++++++++++---------- kernel/sched/fair.c | 177 +++++++++++---------------------------- kernel/sched/rt.c | 130 +++++++++++++++++++++++++++- kernel/sched/stats.c | 104 +++++++++++++++++++++++ kernel/sched/stats.h | 49 +++++++++++ kernel/sched/stop_task.c | 4 +- 9 files changed, 500 insertions(+), 193 deletions(-) -- 2.18.2