On Tue, Oct 10, 2023 at 10:44:16PM -0400, Joel Fernandes wrote: > On Sun, Oct 8, 2023 at 9:20 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > [...] > > > > > How frequent is this function called? We could check something for > > > > > early boot... or track down where the cpu is put online and restore idle > > > > > before that happens? > > > > > > > > Once per RCU Tasks Trace grace period per reader seen to be blocking > > > > that grace period. Its performance is as issue, but not to anywhere > > > > near the same extent as (say) rcu_read_lock_trace(). > > > > > > > > > > > It's also worth noting that the bug this fixes wasn't exposed until the > > > > > > > maple tree (added in v6.1) was used for the IRQ descriptors (added in > > > > > > > v6.5). > > > > > > > > > > > > Lots of latent bugs, to be sure, even with rcutorture. :-/ > > > > > > > > > > The Right Thing is to fix the bug all the way back to the introduction, > > > > > but what fallout makes the backport less desirable than living with the > > > > > unexposed bug? > > > > > > > > You are quite right that it is possible for the risk of a backport to > > > > exceed the risk of the original bug. > > > > > > > > I defer to Joel (CCed) on how best to resolve this in -stable. > > > > > > Maybe I am missing something but this issue should also be happening > > > in mainline right? > > > > > > Even though mainline has 897ba84dc5aa ("rcu-tasks: Handle idle tasks > > > for recently offlined CPUs") , the warning should still be happening > > > due to Liam's "kernel/sched: Modify initial boot task idle setup" > > > because the warning is just rearranged a bit but essentially the same. > > > > > > IMHO, the right thing to do then is to drop Liam's patch from 5.15 and > > > fix it in mainline (using the ideas described in this thread), then > > > backport both that new fix and Liam's patch to 5.15. > > > > > > Or is there a reason this warning does not show up on the mainline? > > > > > > My impression is that dropping Liam's patch for the stable release and > > > revisiting it later is a better approach since tiny RCU is used way > > > less in the wild than tree/tasks RCU. Thoughts? > > > > I think that this one is strange enough that we need to write down the > > situation in detail, make sure we have all the corner cases covered in > > both mainline and -stable, and decide what to do from there. > > > > Yes, I know, this email thread contains much of this information, but > > a little organizing of it would be good. > > > > Would you like to put that together, or should I? If me, I will get > > a draft out by the end of this coming Tuesday, Pacific Time. > > I apologize, I haven't been able to do any real work as I was OOO for > the most part due to dental issues. I am about 25% back now. I will > review your other email writeup and thanks for putting it together! No need to apologize! If anything, it is I who should apologize for not digging deeply into this to begin with. As always, there were distraction. ;-) Thanx, Paul