Re: kernel preemption does not work

Patrick McManus <mcmanus@xxxxxxxxxxxx> · Sat, 12 Apr 2008 11:58:28 -0400

Hi,

On Sat, 2008-04-12 at 23:10 +0800, Peter Teoh wrote:
> On Sat, Apr 12, 2008 at 9:53 PM, Patrick McManus <mcmanus@xxxxxxxxxxxx> wrote:
> >  no. without CONFIG_PREEMPT kernel code is not pre-emptable. Userspace
> >  code is different - that is always pre-emptable and the HZ options you
> >  are looking at help determine the quanta (in either case I believe,
> >  assuming config_preempt is enabled)
> >
> 
> Possibly contentious, but I would like to get a better picture from
> this discussion.
> 

I hope not contentious, this is kernelnewbies after all!

> Like you all along I had this assumption that without CONFIG_PREEMPT
> kernel code is NOT preemptible.   But then why everywhere there are so
> many preempt() function, eg preempt_disable() and preempt_enable()
> which executed whether u compile with CONFIG_PREEMPT or not.
> 

take preempt_disable as an example - look in include/linux/preempt.h for
its two definitions. When CONFIG_PREEMPT is defined it compiles into
inc_preempt_count(), when PREEMPT is not defined it compiles into an
empty "do {} while()" (i.e. no asm code generated - so it doesn't really
exist).

so it really is conditional on the config option. same thing for
preempt_disable(), preempt_enable_nosched(), and
preempt_check_resched().

> Then looking at almost all the architecture independent files + those
> of x86, I came to the conclusion that CONFIG_PREEMPT is to instrument
> the codes so as to allow it to call schedule(), eg:

I'm with you. Scheduling some other code is what makes pre-emption
happen ;).. 

> 
> So the conclusion is that CONFIG_PREEMPT is to enable the scheduler to
> reexamine the tasks queue more often, making the kernel more
> interactive, but at the cost of higher volume of processing.
> 

I would say that CONFIG_PREEMPT allows the scheduler to interrupt normal
kernel code (unless explicitly disabled) and run some other task (which
might be userspace code or it might be other kernel code) before
returning to the point where it left off. 

I agree that it helps interactivity becaue more code can now be
preempted (i.e. most kernel code in addition to userland code) and adds
some overhead. That's why it is generally recommended for desktops but
not for servers.

> As for CONFIG_NO_HZ, after reading kernel/softirq.c:

I'm confused about the relevance of NO_HZ... I didn't realize that's
what you meant with the last mail.

NO_HZ is the "tickless kernel" stuff. As you say, it's not directly tied
to kernel pre-emption in any way. I've never used it, but as I
understand it the tickless kernel substitutes an endless series of "next
wakeup" timers based on the current needs of the system instead of
waking up every HZ.. 

ironically, I think you still need HZ defined when running NO_HZ. It
looks that way from the Kconfig bits in the kernel to me. The HZ value
would be used to figure out the "next wakeup" time when the scheduler is
juggling multiple things so that a pre-emption (either kernel or user)
can happen.. but when the scheduler is empty it does not need to arm a
timer at all. As I understand it that is the real value of the tickless
feature - preventing un-necessary wakeups and thus saving power when
everything is idle.

does that make sense?

> 

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ