Hi all, We have been playing around with SCHED_DEADLINE and found some discrepancy around the calculation of nr_involuntary_switches and nr_voluntary_switches in /proc/${PID}/sched. Whenever the task is done with it's work earlier and executes sched_yield() to voluntarily gives up the CPU this increments nr_involuntary_switches. It should have incremented nr_voluntary_switches. This can be easily demonstrated by running cyclicdeadline task which is part of rt-tests(https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests .git/) and checking the value of nr_voluntary_switches. Please note that the issue seems to be with sched_yield() and not SCHED_DEADLINE because we have seen similar behavior when we tried switching to other policies. But, we are using SCHED_DEADLINE because it is one of the (very) few scenarios where sched_yield() can be used correctly. Some analysis: -------------- I enabled the sched/sched_switch (setting cyclicdeadline as filter) and syscalls/sys_enter_sched_yield events to check whether the sched_yield() call was resulting in a new task running. I got the following results: cyclicdeadline-3290 [003] ....... 3111.132786: tracing_mark_write: start at 3111125101 off=3 (period=3111125098 next=3111126098) cyclicdeadline-3290 [003] ....1.. 3111.132789: sys_sched_yield() cyclicdeadline-3290 [003] d...2.. 3111.132797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120 cyclicdeadline-3290 [003] ....... 3111.133786: tracing_mark_write: start at 3111126101 off=3 (period=3111126098 next=3111127098) cyclicdeadline-3290 [003] ....1.. 3111.133789: sys_sched_yield() cyclicdeadline-3290 [003] d...2.. 3111.133797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120 cyclicdeadline-3290 [003] ....... 3111.134786: tracing_mark_write: start at 3111127101 off=3 (period=3111127098 next=3111128098) cyclicdeadline-3290 [003] ....1.. 3111.134789: sys_sched_yield() cyclicdeadline-3290 [003] d...2.. 3111.134797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120 .... As seen above, all the sched_yield calls are followed by sched switch. So, we believe that the sched_yield() is actually resulting in a switch. The values for nr_voluntary_switches/nr_involuntary_switches in this scenario: nr_switches : 138753 nr_voluntary_switches : 1 nr_involuntary_switches : 138752 Looking at __schedule() in kernel/sched/core.c, the switch is counted as part of nr_involuntary_switches if the task has not been preempted and the task is TASK_RUNNING state. This does not seem to happen when sched_yield() is called. Is there something we are missing over here? OR Is this a known issue and is planned to be fixed later? Thanks, Vedang Patel