On Wed, Apr 01, 2020 at 01:00:28PM +0300, John Mathew wrote: I dispise RST, it's an unreadable mess, but I did skim the document and felt I should comment on this: > +* _cond_resched() : It gives the scheduler a chance to run a > + higher-priority process. > + > +* __cond_resched_lock() : if a reschedule is pending, drop the given > + lock, call schedule, and on return reacquire the lock. Those are not functions anybody should be using; the normal entry points are: cond_resched() and cond_resched_lock(). > +Scheduler State Transition > +========================== > + > +A very high level scheduler state transition flow with a few states can be > +depicted as follows. > + > +.. kernel-render:: DOT > + :alt: DOT digraph of Scheduler state transition > + :caption: Scheduler state transition > + > + digraph sched_transition { > + node [shape = point, label="exisiting task\n calls fork()"]; fork > + node [shape = box, label="TASK_NEW\n(Ready to run)"] tsk_new; > + node [shape = box, label="TASK_RUNNING\n(Ready to run)"] tsk_ready_run; > + node [shape = box, label="TASK_RUNNING\n(Running)"] tsk_running; > + node [shape = box, label="TASK_DEAD\nEXIT_ZOMBIE"] exit_zombie; > + node [shape = box, label="TASK_INTERRUPTIBLE\nTASK_UNINTERRUPTIBLE\nTASK_WAKEKILL"] tsk_int; > + fork -> tsk_new [ label = "task\nforks" ]; > + tsk_new -> tsk_ready_run; > + tsk_ready_run -> tsk_running [ label = "schedule() calls context_switch()" ]; > + tsk_running -> tsk_ready_run [ label = "task is pre-empted" ]; > + subgraph int { > + tsk_running -> tsk_int [ label = "task needs to wait for event" ]; > + tsk_int -> tsk_ready_run [ label = "event occurred" ]; > + } > + tsk_int -> exit_zombie [ label = "task exits via do_exit()" ]; > + } And that is a prime example of why I hates RST, it pretty much mandates you view this with something other than a text editor. Also, Daniel, you modeled all this, is the above anywhere close? > +Scheduler provides trace points tracing all major events of the scheduler. > +The tracepoints are defined in :: > + > + include/trace/events/sched.h > + > +Using these treacepoints it is possible to model the scheduler state transition > +in an automata model. The following conference paper discusses such modeling. > + > +https://www.researchgate.net/publication/332440267_Modeling_the_Behavior_of_Threads_in_the_PREEMPT_RT_Linux_Kernel_Using_Automata Ah, you've found Daniel ;-) > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1a9983da4408..ccefc820557f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3578,8 +3578,12 @@ unsigned long long task_sched_runtime(struct task_struct *p) > return ns; > } > > -/* > - * This function gets called by the timer code, with HZ frequency. > +/** > + * scheduler_tick - > + * > + * This function is called on every timer interrupt with HZ frequency and > + * calls scheduler on any task that has used up its quantum of CPU time. > + * > * We call it with interrupts disabled. > */ > void scheduler_tick(void) > @@ -3958,8 +3962,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) > BUG(); > } > > -/* > - * __schedule() is the main scheduler function. > +/** > + * __schedule() - The main scheduler function. > * > * The main means of driving the scheduler and thus entering this function are: > * > @@ -4086,6 +4090,12 @@ static void __sched notrace __schedule(bool preempt) > balance_callback(rq); > } > > +/** > + * do_task_dead - Final step of task exit > + * > + * Changes the the task state to TASK_DEAD and calls schedule to pick next > + * task to run. > + */ That has whitespace damage. > void __noreturn do_task_dead(void) > { > /* Causes final put_task_struct in finish_task_switch(): */ > @@ -4244,7 +4254,9 @@ static void __sched notrace preempt_schedule_common(void) > } > > #ifdef CONFIG_PREEMPTION > -/* > +/** > + * preempt_schedule - > + * > * This is the entry point to schedule() from in-kernel preemption > * off of preempt_enable. > */ > @@ -4316,7 +4328,9 @@ EXPORT_SYMBOL_GPL(preempt_schedule_notrace); > > #endif /* CONFIG_PREEMPTION */ > > -/* > +/** > + * preempt_schedule_irq - > + * > * This is the entry point to schedule() from kernel preemption > * off of irq context. > * Note, that this is called and return with irqs disabled. This will > @@ -5614,6 +5628,11 @@ SYSCALL_DEFINE0(sched_yield) > } > > #ifndef CONFIG_PREEMPTION > +/** > + * _cond_resched - > + * > + * gives the scheduler a chance to run a higher-priority process > + */ > int __sched _cond_resched(void) > { > if (should_resched(0)) { > @@ -5626,9 +5645,10 @@ int __sched _cond_resched(void) > EXPORT_SYMBOL(_cond_resched); > #endif > > -/* > - * __cond_resched_lock() - if a reschedule is pending, drop the given lock, > +/** > + * __cond_resched_lock - if a reschedule is pending, drop the given lock, > * call schedule, and on return reacquire the lock. > + * @lock: target lock > * > * This works OK both with and without CONFIG_PREEMPTION. We do strange low-level > * operations here to prevent schedule() from being called twice (once via Just know that the first time someone comes and whines about how a scheduler comment doesn't build or generates bad output, I remove the /** kerneldoc thing. Also, like I said above, _cond_resched() and __cond_resched_lock() really should not be exposed like this, they're not API.