Le 12/09/2023 à 17:08, Paul E. McKenney a écrit : > On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote: >> * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [230912 09:56]: >>> * Paul E. McKenney <paulmck@xxxxxxxxxx> [230912 06:00]: >>>> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote: >>>>> Hi Paul, >>>>> >>>>> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: >>>>>> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote: >>>>>>> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: >>>>>>>> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote: >>>>>>>>> * Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 14:03]: >>>>>>>>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote: >>>>>>>>>>> * Paul E. McKenney <paulmck@xxxxxxxxxx> [230906 13:24]: >>>>>>>>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote: >>>>>>>>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why) >>>>>>>>>>>>> >>>>>>>>>>>>> Apologies on the late response, I was away and have been struggling to >>>>>>>>>>>>> get a working PPC32 test environment. >>>>>>>>>>>>> >>>>>>>>>>>>> * Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> [230829 12:42]: >>>>>>>>>>>>>> Hi Liam, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote: >>>>>>>>>>>>>>> The current implementation of append may cause duplicate data and/or >>>>>>>>>>>>>>> incorrect ranges to be returned to a reader during an update. Although >>>>>>>>>>>>>>> this has not been reported or seen, disable the append write operation >>>>>>>>>>>>>>> while the tree is in rcu mode out of an abundance of caution. >>>>>>>>>>>>> >>>>>>>>>>>>> ... >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>>>>>>>> RCU-related configs: >>>>>>>>>>>>>> >>>>>>>>>>>>>> $ grep RCU .config >>>>>>>>>>>>>> # RCU Subsystem >>>>>>>>>>>>>> CONFIG_TINY_RCU=y >>>>>>>> >>>>>>>> I must have been asleep last time I looked at this. I was looking at >>>>>>>> Tree RCU. Please accept my apologies for my lapse. :-/ >>>>>>>> >>>>>>>> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would >>>>>>>> have said the same thing, albeit after looking at a lot less RCU code. >>>>>>>> >>>>>>>> TL;DR: >>>>>>>> >>>>>>>> 1. Try making the __setup_irq() function's call to mutex_lock() >>>>>>>> instead be as follows: >>>>>>>> >>>>>>>> if (!mutex_trylock(&desc->request_mutex)) >>>>>>>> mutex_lock(&desc->request_mutex); >>>>>>>> >>>>>>>> This might fail if __setup_irq() has other dependencies on a >>>>>>>> fully operational scheduler. >> >> This changes where the interrupts become enabled, but doesn't stop it >> from happening. It still throws a WARN after init_IRQ(). I suspect it >> is not the way to proceed as there are probably many places that will >> need to be changed in both common and arch specific code. As soon as >> TIF_NEED_RESCHED is set, then all the checks will need to be altered. > > Thank you for trying it! > >> I think we either need to set the boot thread to !idle, avoid call_rcu() >> to set TIF_NEED_RESCHED (how was this working before? Do we need rcu >> for the IRQs?), or alter the boot order (note this is NOT arch or >> platform code here). >> >> I don't like any of these. I'd like another option, please? > > My favorite is to move the interrupt enabling later, but Michael Ellerman > would know better than would I about the feasibility of this. > I digged into it a bit more, looks like IRQs get enabled by the call to cond_resched() in the loop in vm_area_alloc_pages(), which is called from powerpc's init_IRQ() fonction when allocating IRQ stacks. And IRQ stacks definitely need to be enabled before IRQs get enabled, so there's something wrong here isn't it ? Christophe