Hi, On Sat, 2008-04-12 at 23:10 +0800, Peter Teoh wrote: > On Sat, Apr 12, 2008 at 9:53 PM, Patrick McManus <mcmanus@xxxxxxxxxxxx> wrote: > > no. without CONFIG_PREEMPT kernel code is not pre-emptable. Userspace > > code is different - that is always pre-emptable and the HZ options you > > are looking at help determine the quanta (in either case I believe, > > assuming config_preempt is enabled) > > > > Possibly contentious, but I would like to get a better picture from > this discussion. > I hope not contentious, this is kernelnewbies after all! > Like you all along I had this assumption that without CONFIG_PREEMPT > kernel code is NOT preemptible. But then why everywhere there are so > many preempt() function, eg preempt_disable() and preempt_enable() > which executed whether u compile with CONFIG_PREEMPT or not. > take preempt_disable as an example - look in include/linux/preempt.h for its two definitions. When CONFIG_PREEMPT is defined it compiles into inc_preempt_count(), when PREEMPT is not defined it compiles into an empty "do {} while()" (i.e. no asm code generated - so it doesn't really exist). so it really is conditional on the config option. same thing for preempt_disable(), preempt_enable_nosched(), and preempt_check_resched(). > Then looking at almost all the architecture independent files + those > of x86, I came to the conclusion that CONFIG_PREEMPT is to instrument > the codes so as to allow it to call schedule(), eg: I'm with you. Scheduling some other code is what makes pre-emption happen ;).. > > So the conclusion is that CONFIG_PREEMPT is to enable the scheduler to > reexamine the tasks queue more often, making the kernel more > interactive, but at the cost of higher volume of processing. > I would say that CONFIG_PREEMPT allows the scheduler to interrupt normal kernel code (unless explicitly disabled) and run some other task (which might be userspace code or it might be other kernel code) before returning to the point where it left off. I agree that it helps interactivity becaue more code can now be preempted (i.e. most kernel code in addition to userland code) and adds some overhead. That's why it is generally recommended for desktops but not for servers. > As for CONFIG_NO_HZ, after reading kernel/softirq.c: I'm confused about the relevance of NO_HZ... I didn't realize that's what you meant with the last mail. NO_HZ is the "tickless kernel" stuff. As you say, it's not directly tied to kernel pre-emption in any way. I've never used it, but as I understand it the tickless kernel substitutes an endless series of "next wakeup" timers based on the current needs of the system instead of waking up every HZ.. ironically, I think you still need HZ defined when running NO_HZ. It looks that way from the Kconfig bits in the kernel to me. The HZ value would be used to figure out the "next wakeup" time when the scheduler is juggling multiple things so that a pre-emption (either kernel or user) can happen.. but when the scheduler is empty it does not need to arm a timer at all. As I understand it that is the real value of the tickless feature - preventing un-necessary wakeups and thus saving power when everything is idle. does that make sense? > -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ