Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> writes: > On 3/10/2024 11:56 PM, Paul E. McKenney wrote: >> On Sun, Mar 10, 2024 at 08:48:28PM -0400, Joel Fernandes wrote: >>> On 3/10/2024 2:56 PM, Paul E. McKenney wrote: >>>> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote: >>>>> Hello Ankur and Paul, >>>>> >>>>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote: >>>>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent >>>>>> states for read-side critical sections via rcu_all_qs(). >>>>>> One reason why this was necessary: lacking preempt-count, the tick >>>>>> handler has no way of knowing whether it is executing in a read-side >>>>>> critical section or not. >>>>>> >>>>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y, >>>>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does >>>>>> not provide for quiescent states via rcu_all_qs(). >>>>>> >>>>>> So, use the availability of preempt_count() to report quiescent states >>>>>> in rcu_flavor_sched_clock_irq(). >>>>>> >>>>>> Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> >>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> >>>>>> --- >>>>>> kernel/rcu/tree_plugin.h | 11 +++++++---- >>>>>> 1 file changed, 7 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >>>>>> index 26c79246873a..9b72e9d2b6fe 100644 >>>>>> --- a/kernel/rcu/tree_plugin.h >>>>>> +++ b/kernel/rcu/tree_plugin.h >>>>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) >>>>>> */ >>>>>> static void rcu_flavor_sched_clock_irq(int user) >>>>>> { >>>>>> - if (user || rcu_is_cpu_rrupt_from_idle()) { >>>>>> + if (user || rcu_is_cpu_rrupt_from_idle() || >>>>>> + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && >>>>>> + !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) { >>>>> >>>>> I was wondering if it makes sense to even support !PREEMPT_RCU under >>>>> CONFIG_PREEMPT_AUTO. >>>>> >>>>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on >>>>> the next tick boundary in the worst case, with all preempt modes including >>>>> the preempt=none mode. >>>>> >>>>> Considering this, does it makes sense for RCU to be non-preemptible in >>>>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical >>>>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption >>>>> from happening, because rcu_read_lock() would preempt_disable(). >>>> >>>> Yes, it does make sense for RCU to be non-preemptible in kernels >>>> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or >>>> CONFIG_PREEMPT_VOLUNTARY=y. >>>> As noted in earlier discussions, there are >>> >>> Sorry if I missed a discussion, appreciate a link. >> >> It is part of the discussion of the first version of this patch series, >> if I recall correctly. >> >>>> systems that are adequately but not abundantly endowed with memory. >>>> Such systems need non-preemptible RCU to avoid preempted-reader OOMs. >>> >>> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and >>> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's >>> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and >>> preempt=none boot parameter. IMHO, if this feature is inconsistent with >>> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing. In fact, I feel >>> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter >>> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of >>> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be >>> burden to the users configuring the PREEMPT Kconfig option IMHO. >> >> Because such systems are built with CONFIG_PREEMPT_DYNAMIC=n. >> >> You could argue that we should just build with CONFIG_PREEMPT_AUTO=n, >> but the long-term goal of eliminating cond_resched() will make that >> ineffective. > > I see what you mean. We/I could also highlight some of the differences in RCU > between DYNAMIC vs AUTO. > >> >>>> Note well that non-preemptible RCU explicitly disables preemption across >>>> all RCU readers. >>> >>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point >>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel >>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has >>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from >>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on >>> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set >>> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible >>> kernel but are worried about reader preemptions. >> >> Such users can simply avoid building with either CONFIG_PREEMPT_NONE=y >> or with CONFIG_PREEMPT_VOLUNTARY=y. They might also experiment with >> CONFIG_RCU_BOOST=y, and also with short timeouts until boosting. >> If that doesn't do what they need, we talk with them and either help >> them configure their kernels, make RCU do what they need, or help work >> out some other way for them to get their jobs done. > > Makes sense. > >>> That aside, as such, I do agree your point of view, that preemptible readers >>> presents a problem to folks using preempt=none in this series and we could >>> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was >>> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat >>> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a >>> week from now, no one will likely be able to tell the difference ;-). So IMHO >>> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU >>> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y. >>> >>> Let me know if I missed something. >> >> Why not key off of the value of CONFIG_PREEMPT_DYNAMIC? That way, >> if both CONFIG_PREEMPT_AUTO=y and CONFIG_PREEMPT_DYNAMIC=y, RCU is >> always preemptible. Then CONFIG_PREEMPT_DYNAMIC=y enables boot-time >> (and maybe even run-time) switching between preemption flavors, while >> CONFIG_PREEMPT_AUTO instead enables unconditional preemption of any >> region of code that has not explicitly disabled preemption (or irq or >> bh or whatever). Currently CONFIG_PREEMPT_DYNAMIC does a few things: 1. dynamic selection of preemption model 2. dynamically toggling explicit preemption points 3. PREEMPT_RCU=y (though maybe this should be fixed to also also allow PREEMPT_RCU=n) Of these 3, PREEMPT_AUTO only really needs (1). Maybe combining gives us the option of switching between the old and the new models: preempt=none | voluntary | full | auto-none | auto-voluntary Where the last two provide the new auto semantics. But, the mixture seems too rich. This just complicates all the CONFIG_PREEMPT_* configurations more than they were before when the end goal is to actually reduce and simplify the number of options. > That could be done. But currently, these patches disable DYNAMIC if AUTO is > enabled in the config. I think the reason is the 2 features are incompatible. > i.e. DYNAMIC wants to override the preemption mode at boot time, where as AUTO > wants the scheduler to have a say in it using the need-resched LAZY bit. Yeah exactly. That's why I originally made PREEMPT_AUTO and PREEMPT_DYNAMIC exclusive of each other. Thanks -- ankur