* Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> [230913 09:15]: > Hi Paul, > > On Tue, Sep 12, 2023 at 12:00 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote: > > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote: > > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote: > > > > > > > * Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 14:03]: > > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote: > > > > > > > > > * Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 13:24]: > > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote: > > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why) > > > > > > > > > > > > > > > > > > > > > > Apologies on the late response, I was away and have been struggling to > > > > > > > > > > > get a working PPC32 test environment. > > > > > > > > > > > > > > > > > > > > > > * Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> [230829 12:42]: > > > > > > > > > > > > Hi Liam, > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote: > > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or > > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update. Although > > > > > > > > > > > > > this has not been reported or seen, disable the append write operation > > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution. > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > RCU-related configs: > > > > > > > > > > > > > > > > > > > > > > > > $ grep RCU .config > > > > > > > > > > > > # RCU Subsystem > > > > > > > > > > > > CONFIG_TINY_RCU=y > > > > > > > > > > > > I must have been asleep last time I looked at this. I was looking at > > > > > > Tree RCU. Please accept my apologies for my lapse. :-/ > > > > > > > > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would > > > > > > have said the same thing, albeit after looking at a lot less RCU code. > > > > > > > > > > > > TL;DR: > > > > > > > > > > > > 1. Try making the __setup_irq() function's call to mutex_lock() > > > > > > instead be as follows: > > > > > > > > > > > > if (!mutex_trylock(&desc->request_mutex)) > > > > > > mutex_lock(&desc->request_mutex); > > > > > > > > > > > > This might fail if __setup_irq() has other dependencies on a > > > > > > fully operational scheduler. > > > > > > > > > > > > 2. Move that ppc32 call to __setup_irq() much later, most definitely > > > > > > after interrupts have been enabled and the scheduler is fully > > > > > > operational. Invoking mutex_lock() before that time is not a > > > > > > good idea. ;-) > > > > > > > > > > There is no call to __setup_irq() from arch/powerpc/? > > > > > > > > Glad it is not just me, given that I didn't see a direct call, either. So > > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled()) > > > > just before that mutex_lock() in __setup_irq(). > > > > > > > > Either way, invoking mutex_lock() early in boot before interrupts have > > > > been enabled is a bad idea. ;-) > > > > > > I'll add that WARN_ON_ONCE() too, and will report back later today... > > > > Thank you, looking forward to hearing the outcome! > > On a typical arm32 system with global or architectured timer, I get: > > WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xcc/0x6d0 > CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 6.6.0-rc1-shmobile-02354-g24e058b77f5a #1655 > Hardware name: Generic R-Car Gen2 (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from __warn+0x78/0x10c > __warn from warn_slowpath_fmt+0x90/0x11c > warn_slowpath_fmt from __setup_irq+0xcc/0x6d0 > __setup_irq from __request_percpu_irq+0xb8/0xd0 > __request_percpu_irq from set_smp_ipi_range+0x88/0xdc > set_smp_ipi_range from gic_of_init+0x1a4/0x4d8 > gic_of_init from of_irq_init+0x1f0/0x320 > of_irq_init from init_IRQ+0x74/0x104 > init_IRQ from start_kernel+0x360/0x5d0 > start_kernel from 0x0 > > Likewise on arm64 with architectured timer. > But on these systems I do not see the issue I reported. > > On RISC-V: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0x4e6/0x5ee > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-02355-g63165363c6a3 #63 > Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT) > epc : __setup_irq+0x4e6/0x5ee > ra : __setup_irq+0x38/0x5ee > epc : ffffffff80049472 ra : ffffffff80048fc4 sp : ffffffff81203cd0 > gp : ffffffff812ee760 tp : ffffffff8120d5c0 t0 : ffffffd801854300 > t1 : 0000000000000000 t2 : ffffffff81000ad8 s0 : ffffffff81203d20 > s1 : ffffffd801855000 a0 : 0000000000000001 a1 : ffffffd801855000 > a2 : ffffffd801854280 a3 : 0000000000000001 a4 : 0000000000000000 > a5 : 0000000000000000 a6 : ffffffd801852108 a7 : ffffffd801852100 > s2 : ffffffd801854280 s3 : 0000000000000005 s4 : ffffffff812c54c0 > s5 : 0000000000000005 s6 : ffffffff80dd83a0 s7 : ffffffff805c0cc0 > s8 : ffffffd801855018 s9 : 0000000000000000 s10: 0000000000000000 > s11: 000000007bf638a0 t3 : 0000000000000000 t4 : 0000000000000002 > t5 : ffffffff812882a0 t6 : 0000000000000001 > status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > [<ffffffff80049472>] __setup_irq+0x4e6/0x5ee > [<ffffffff800497a8>] __request_percpu_irq+0x98/0xcc > [<ffffffff8082501e>] riscv_timer_init_dt+0x186/0x22e > [<ffffffff80824b62>] timer_probe+0x62/0xd2 > [<ffffffff80803c36>] time_init+0x86/0xa6 > [<ffffffff80800ae2>] start_kernel+0x436/0x618 > ---[ end trace 0000000000000000 ]--- > > Also, no issue here. > > On the affected systems (RZ/A1 and RZ/A2), the WARN_ON_ONCE() did > not trigger, until I applied Liam's patch ("init/main: Clear boot task > idle flag"), which got rid of the "Interrupts were enabled early" warning, > and now tells me, as expected: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xc8/0x654 > CPU: 0 PID: 0 Comm: swapper Not tainted > 6.6.0-rc1-rskrza1-02357-g237e09fd64b8-dirty #829 > Hardware name: Generic R7S72100 (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x24/0x3c > dump_stack_lvl from __warn+0x74/0xb8 > __warn from warn_slowpath_fmt+0x78/0xb0 > warn_slowpath_fmt from __setup_irq+0xc8/0x654 > __setup_irq from request_threaded_irq+0xac/0x13c > request_threaded_irq from timer_of_init+0x238/0x2c8 > timer_of_init from ostm_init+0x98/0x208 > ostm_init from timer_probe+0x90/0xe4 > timer_probe from start_kernel+0x2c0/0x488 > start_kernel from 0x0 > ---[ end trace 0000000000000000 ]--- > > However, Liam's patch causes lots of warnings on the other systems... > What patch? The "init/main" patch? What systems and what are they? Thanks, Liam