On Wed, Sep 28, 2022 at 10:51 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > On Wed Sep 28, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote: > > Thank Nick for reviewing my patch > > > > On Tue, Sep 27, 2022 at 12:25 PM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > > > > > On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote: > > > > This is second version of my fix to PPC's "WARNING: suspicious RCU usage", > > > > I improved my fix under Paul E. McKenney's guidance: > > > > Link: https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzhouyi@xxxxxxxxx/T/ > > > > > > > > During the cpu offlining, the sub functions of xive_teardown_cpu will > > > > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will > > > > travel RCU protected list, so "WARNING: suspicious RCU usage" will be > > > > triggered. > > > > > > > > Avoid lockdep when we are offline. > > > > > > I don't see how this is safe. If RCU is no longer watching the CPU then > > > the memory it is accessing here could be concurrently freed. I think the > > > warning is valid. > > Agree > > > > > > powerpc's problem is that cpuhp_report_idle_dead() is called before > > > arch_cpu_idle_dead(), so it must not rely on any RCU protection there. > > > I would say xive cleanup just needs to be done earlier. I wonder why it > > > is not done in __cpu_disable or thereabouts, that's where the interrupt > > > controller is supposed to be stopped. > > Yes, I learn flowing events sequence from kgdb debugging > > __cpu_disable -> pseries_cpu_disable -> set_cpu_online(cpu, false) = > > leads to => do_idle: if (cpu_is_offline(cpu) -> arch_cpu_idle_dead > > so xive cleanup should be done in pseries_cpu_disable. > > It's a good catch and a reasonable approach to the problem. Thank Nick for your encouragement ;-) > > > But as a beginner, I afraid that I am incompetent to do above > > sophisticated work without error although I am very like to, > > Could any expert do this for us? > > This will be difficult for anybody, it's tricky code. I'm not an > expert at it. > > It looks like the interrupt controller disable split has been there > since long before xive. I would try just move them together than see > if that works. Yes, I use "git blame" (I learned "git blame" from Paul E. McKenny ;-) ) to see the same. and anticipate your great works! > > Documentation/core-api/cpu_hotplug.rst says that __cpu_disable should > shut down the interrupt handler. So if there is a complication it > would probably be from powerpc-specific CPU hotplug or interrupt > code. Thank Nick for your guidance! I studied Documentation/core-api/cpu_hotplug.rst this morning. I also found X86 shut down the interrupt handler in __cpu_disable according to above document. Many Thanks Zhouyi > > Thanks, > Nick >