* Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 14:03]: > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote: > > * Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 13:24]: > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote: > > > > (Adding Paul & Shanker to Cc list.. please see below for why) > > > > > > > > Apologies on the late response, I was away and have been struggling to > > > > get a working PPC32 test environment. > > > > > > > > * Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> [230829 12:42]: > > > > > Hi Liam, > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote: > > > > > > The current implementation of append may cause duplicate data and/or > > > > > > incorrect ranges to be returned to a reader during an update. Although > > > > > > this has not been reported or seen, disable the append write operation > > > > > > while the tree is in rcu mode out of an abundance of caution. > > > > > > > > ... > > > > > > ... > > > > > RCU-related configs: > > > > > > > > > > $ grep RCU .config > > > > > # RCU Subsystem > > > > > CONFIG_TINY_RCU=y > > > > > # CONFIG_RCU_EXPERT is not set > > > > > CONFIG_TINY_SRCU=y > > > > > # end of RCU Subsystem > > > > > # RCU Debugging > > > > > # CONFIG_RCU_SCALE_TEST is not set > > > > > # CONFIG_RCU_TORTURE_TEST is not set > > > > > # CONFIG_RCU_REF_SCALE_TEST is not set > > > > > # CONFIG_RCU_TRACE is not set > > > > > # CONFIG_RCU_EQS_DEBUG is not set > > > > > # end of RCU Debugging > > > > > > > > I used the configuration from debian 8 and ran 'make oldconfig' to build > > > > my kernel. I have attached the configuration. ... > > > > It appears to be something to do with struct maple_tree sparse_irqs. If > > > > you drop the rcu flag from that maple tree, then my configuration boots > > > > without the warning. > > > > > > > > I *think* this is because we will reuse a lot more nodes. And I *think* > > > > the rcu flag is not needed, since there is a comment about reading the > > > > tree being protected by the mutex sparse_irq_lock within the > > > > kernel/irq/irqdesc.c file. Shanker, can you comment on that? > > > > > > > > I wonder if there is a limit to the number of RCU free events before > > > > something is triggered to flush them out which could trigger IRQ > > > > enabling? Paul, could this be the case? > > > > > > Are you asking if call_rcu() will re-enable interrupts in the following > > > use case? > > > > > > local_irq_disable(); > > > call_rcu(&p->rh, my_cb_func); > > > local_irq_enable(); I am not. ... > > > > > > Or am I missing your point? > > > > This is very early in the boot sequence when interrupts have not been > > enabled. What we are seeing is a WARN_ON() that is triggered by > > interrupts being enabled before they should be enabled. > > > > I was wondering if, for example, I called call_rcu() a lot *before* > > interrupts were enabled, that something could trigger that would either > > enable interrupts or indicate the task needs rescheduling? > > You aren't doing call_rcu() enough to hit OOM, are you? The actual RCU > callback invocations won't happen until some time after the scheduler > starts up. I am not, it's just a detection of IRQs being enabled early. > > > Specifically the rescheduling part is suspect. I tracked down the call > > to a mutex_lock() which calls cond_resched(), so could rcu be > > 'encouraging' the rcu window by a reschedule request? > > During boot before interrupts are enabled, RCU has not yet spawned any of > its kthreads. Therefore, all of its attempts to do wakeups would notice > a NULL task_struct pointer and refrain from actually doing the wakeup. > If it did do the wakeup, you would see a NULL-pointer exception. See > for example, invoke_rcu_core_kthread(), though that won't happen unless > you booted with rcutree.use_softirq=0. > > Besides, since when did doing a wakeup enable interrupts? That would > make it hard to do wakeups from hardware interrupt handlers, not? Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a cond_resched(). >From what Michael said [1] in this thread, since something has already set TIF_NEED_RESCHED, it will eventually enable interrupts on us. I've traced this to running call_rcu() in kernel/rcu/tiny.c and is_idle_task(current) is true, which means rcu runs: /* force scheduling for rcu_qs() */ resched_cpu(0); the task is set idle in sched_init() -> init_idle() and never changed, afaict. Removing the RCU option from the maple tree in kernel/irq/irqdesc.c fixes the issue by avoiding the maple tree running call_rcu(). I am not sure on the locking of the tree so I feel this change may cause other issues...also it's before lockdep_init(), so any issue I introduce may not be detected. When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the same thing, but the IRQs are not enabled on return. So, resched_cpu(0) is called, but the IRQs warning of enabled isn't triggered. I failed to find a reason why. I am not entirely sure what makes ppc32 different than other platforms in that the initial task is configured to an idle task and the first call to call_rcu (tiny!) would cause the observed behaviour. Non-tiny rcu calls (as I am sure you know, but others may not) kernel/rcu/tree.c which in turn calls __call_rcu_common(). That function is far more complex than the tiny version. Maybe it's part of why we see different behaviour based on platforms? I don't see an idle check in that version of call_rcu(). Or maybe PPC32 has something set incorrectly to cause this failure in early boot and I've just found something that needs to be set differently? > > But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas > of greatest suspicion, starting from the stack trace generated by that > mutex_lock()? A stray interrupt-enable could be pretty much anywhere. > > But where are those call_rcu() invocations? Before rcu_init()? During init_IRQ(), which is after rcu_init() but before rcu_init_nohz(), srcu_init(), and softirq_init() in init/main.c start_kernel(). > Presumably before init is spawned and the early_init() calls. > > And what is the RCU-related Kconfig and boot-parameter setup? The .config was attached to the email I sent, and it matches what was quoted above in the "RCU-related configs" section. [1] https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/