Re: Re: Re: Re: Understanding non-preemptive kernel

Mulyadi Santosa <mulyadi.santosa@xxxxxxxxx> · Sat, 28 Jan 2006 14:32:58 +0700

Hi Rajaram...

>   1. In Robert Love's book, he describes timeslice as "how long a
> task can run untill it is preempted". I also see his subsequent
> description like " a process does not have to use all its timeslice
> at once. For example, a process with a 100 millisecond timeslice does
> not have to run for 100 milliseconds in one go....it can run on five
> different reschedules for 20 milliseconds each." The second
> description seems to contradict with the first one. If timeslice is
> the time the process can run until it is preempted, then how can the
> timeslice be divided in to five reschedules  ?  How can it be
> rescheduled without preemption ? Does it mean the process can
> voluntarily give up the CPU within the timeslice and then come back
> later ?

Yeap, you got the point. You already tested it, remember? manually calls 
schedule() or yield() does the voluntary re-scheduling...thus  
voluntarily gives the other task an opportunity to run.

Maybe i can do better rewording "a process can choose, whether to 
consume its time slice completely or not. If it chooses to consume it 
all in one run, the scheduler will force the process to be replaced 
with another ready-to-run process. If the process yields its time slot 
while there is time slice left, the scheduler will pick another 
runnable process to run. The process might come back and consumes the 
rest of its time slice"

>   2. Also in most of the places, he says that the time slice is
> dependent on the priority and that both priority and timeslice are
> dynamically calculated by the scheduler. But suddenly, he also says
> that timeslice is calculated based on static priority :) What does
> this mean ? Is timeslice not dependent on dynamic priority ? No
> relation between time slice and the dynamic recalculation of
> priorities of the processes ? Because static priority will be just
> the default 0 or whatever set initially. So if timeslice is just
> dependent on static priority, then what is the use of changing
> priorities dynamically ?

Hmm, ok, hang me :) hehehehe....

OK, here is the situation. EVERY TASK (yes, that means thread too) is 
assigned a static priority initially. This value is also known as 
"nice" priority. usually it is zero or whatever you set with "nice" 
command from shell or nice()/setpriority() C function. This value is 
called "static priority" because once you set it, it will never change 
until you modify it. Static prio has range between -20 (highest) and 19 
(lowest). For simplicity, let's asume it is always zero.

>From this static priority, a value is derived/calculated. It is called 
dynamic priority. Why dynamic priority is needed? because the nature of 
a process is also dynamic. Sometimes it can sleep a lot, sometimes it 
can hog CPU all the time. If single process is all the kernel scheduler 
will handle and only this process exists on the system, maybe we don't 
need a dynamic priority at all (since it will always be selected by 
scheduler, right?), but the reality is not.

If two or more processes compete for CPU and their nice priority (zero 
value, as we assumed before) are same, you can imagine what happens. 
Let's assume the scheduler pick the process on the same priority queue 
using FIFO style and process A plus B are queued. A queued first. 
Whatever A does, B will likely being starved. This is true especially 
if A hogs CPU all the time and B sleeps a lot.

So, dynamic priority calculation plays a role here to punish or award a 
process according on what it has done. If it hogs CPU, it receives 
punishment. If it sleeps a lot, it received bonus. "Punishment" is a 
positive value added to the nice priority, while "award" is negative 
value added to the nice priority. So, if A hogs CPU a lot, it might get 
+5, thus its dynamic priority is 0+5=5 plus 20, so it's 25. What is 
"20" in this case? Read that nice value -20 .. 19 is mapped to 0..40 
range, so zero(0) is mapped as 20, one(1) as 21 and so on. And why the 
hell punishment is positive? am I drunk? No. remember, scheduler favour 
smaller dynamic prio value, so decreasing dynamic prio means make the 
process more "favourable". The inverse logic also applies.

Now, another math comes to the arena. Since A hogs CPU a lot, it will be 
awarded punishment, thus stay at bigger dynamic prio than B. Let's 
assume A has 25, B is 15. Now think...assume neither of the process 
does voluntary re-scheduling. If dynamic prio means nothing for time 
slice calculation, they might still get 50 ms as time slice. So, 
eventually, not much advantage we'll see from this bonus/punishment 
system except one will run earlier than other. Then how to make it more 
fair? Time slice is calculated as a function of dynamic priority. 

The formula is simplified in my explanation here (read the kernel 
source, especially kernel/sched.c). The smaller the dynamic priority 
is, the longer time slice it receives. So, A might get 30 ms, while B 
gets 100 ms. So, whatever A does, after 30 ms, it will be preempted by 
the scheduler and B has a chance to run. If B sleeps, A has another 
chance to run. Why award B longer timeslice if it just sleeps all the 
time? Simple, to keep it stay at higher priority. remember again that 
dynamic priority is recalculated when the time slice is up, so B has 
advantage to run first than A. This is done by Linux to improve 
interactivity while still maintaining overall perfomance. It's not 
perfect, but it works ;)

There are many schedulers for Linux kernel. The one we get in mainline 
is O(1) scheduler written by Ingo Molnar. Unofficial ones are Staircase 
scheduler by Con Kolivas, genetic algo based scheduler by Jake 
Moilanen, Peter William's, Nick Piggin's and so on. And yes, shout at 
them if you need better scheduler ;) Possibly you will see Mulyadi's 
chaos scheduler, but I am afraid this one will ruin your PC :)) hehehe

Hope my long explanation really clears the issue for you. So it's 
static--> converted into dynamic prio --> calculated as the basis of 
time slice.

Anybody, feel free to jump in

regards

Mulyadi

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/