On Sat, Oct 07, 2023 at 09:22:55PM -0400, Joel Fernandes wrote: > On Fri, Oct 6, 2023 at 2:20 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Fri, Oct 06, 2023 at 01:57:14PM -0400, Liam R. Howlett wrote: > > > * Paul E. McKenney <paulmck@xxxxxxxxxx> [231006 12:47]: > > > > On Fri, Oct 06, 2023 at 12:20:38PM -0400, Liam R. Howlett wrote: > > > > > * Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> [231005 13:49]: > > > > > > On Wed, 4 Oct 2023 at 23:33, Greg Kroah-Hartman > > > > > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > This is the start of the stable review cycle for the 5.15.134 release. > > > > > > > There are 183 patches in this series, all will be posted as a response > > > > > > > to this one. If anyone has any issues with these being applied, please > > > > > > > let me know. > > > > > > > > > > > > > > Responses should be made by Fri, 06 Oct 2023 17:51:12 +0000. > > > > > > > Anything received after that time might be too late. > > > > > > > > > > > > > > The whole patch series can be found in one patch at: > > > > > > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.134-rc1.gz > > > > > > > or in the git tree and branch at: > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y > > > > > > > and the diffstat can be found below. > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > greg k-h > > > > > > > > > > > > Results from Linaro’s test farm. > > > > > > Regressions on x86. > > > > > > > > > > > > Following kernel warning noticed on x86 while booting stable-rc 5.15.134-rc1 > > > > > > with selftest merge config built kernel. > > > > > > > > > > > > Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx> > > > > > > > > > > > > Anyone noticed this kernel warning ? > > > > > > > > > > > > This is always reproducible while booting x86 with a given config. > > > > > > > > > > >From that config: > > > > > # > > > > > # RCU Subsystem > > > > > # > > > > > CONFIG_TREE_RCU=y > > > > > # CONFIG_RCU_EXPERT is not set > > > > > CONFIG_SRCU=y > > > > > CONFIG_TREE_SRCU=y > > > > > CONFIG_TASKS_RCU_GENERIC=y > > > > > CONFIG_TASKS_RUDE_RCU=y > > > > > CONFIG_TASKS_TRACE_RCU=y > > > > > CONFIG_RCU_STALL_COMMON=y > > > > > CONFIG_RCU_NEED_SEGCBLIST=y > > > > > # end of RCU Subsystem > > > > > > > > > > # > > > > > # RCU Debugging > > > > > # > > > > > CONFIG_PROVE_RCU=y > > > > > # CONFIG_RCU_SCALE_TEST is not set > > > > > # CONFIG_RCU_TORTURE_TEST is not set > > > > > # CONFIG_RCU_REF_SCALE_TEST is not set > > > > > CONFIG_RCU_CPU_STALL_TIMEOUT=21 > > > > > CONFIG_RCU_TRACE=y > > > > > # CONFIG_RCU_EQS_DEBUG is not set > > > > > # end of RCU Debugging > > > > > > > > > > > > > > > > > > > > > > x86 boot log: > > > > > > ----- > > > > > > [ 0.000000] Linux version 5.15.134-rc1 (tuxmake@tuxmake) > > > > > > (x86_64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils > > > > > > for Debian) 2.40) #1 SMP @1696443178 > > > > > > ... > > > > > > [ 1.480701] ------------[ cut here ]------------ > > > > > > [ 1.481296] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:958 > > > > > > trc_inspect_reader+0x80/0xb0 > > > > > > [ 1.481296] Modules linked in: > > > > > > [ 1.481296] CPU: 0 PID: 13 Comm: rcu_tasks_trace Not tainted 5.15.134-rc1 #1 > > > > > > [ 1.481296] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS > > > > > > 2.5 11/26/2020 > > > > > > [ 1.481296] RIP: 0010:trc_inspect_reader+0x80/0xb0 > > > > > > > > > > This function has changed a lot, including the dropping of this > > > > > WARN_ON_ONCE(). The warning was replaced in 897ba84dc5aa ("rcu-tasks: > > > > > Handle idle tasks for recently offlined CPUs") with something that looks > > > > > equivalent so I'm not sure why it would not trigger in newer revisions. > > > > > > > > > > Obviously the behaviour I changed was the test for the task being idle. > > > > > I am not sure how best to short-circuit that test from happening during > > > > > boot as I am not familiar with the RCU code. > > > > > > > > The usual test for RCU's notion of early boot being completed is > > > > (rcu_scheduler_active != RCU_SCHEDULER_INIT). > > > > > > > > Except that "ofl" should always be false that early in boot, at least > > > > in mainline. > > > > > > Is this still true in the final version of the patch where we set the > > > boot task as !idle until just before the early boot is finished? I > > > wouldn't think of this as 'early in boot' anymore as much as the entire > > > kernel setup. Maybe we need to shorten the time we stay in !idle mode > > > for earlier kernels? > > > > In mainline, the ofl variable is defined as cpu_is_offline(cpu), and > > during boot, the boot CPU is guaranteed to be online. (As opposed to > > the boot CPU's idle-task state.) > > > > > How frequent is this function called? We could check something for > > > early boot... or track down where the cpu is put online and restore idle > > > before that happens? > > > > Once per RCU Tasks Trace grace period per reader seen to be blocking > > that grace period. Its performance is as issue, but not to anywhere > > near the same extent as (say) rcu_read_lock_trace(). > > > > > > > It's also worth noting that the bug this fixes wasn't exposed until the > > > > > maple tree (added in v6.1) was used for the IRQ descriptors (added in > > > > > v6.5). > > > > > > > > Lots of latent bugs, to be sure, even with rcutorture. :-/ > > > > > > The Right Thing is to fix the bug all the way back to the introduction, > > > but what fallout makes the backport less desirable than living with the > > > unexposed bug? > > > > You are quite right that it is possible for the risk of a backport to > > exceed the risk of the original bug. > > > > I defer to Joel (CCed) on how best to resolve this in -stable. > > Maybe I am missing something but this issue should also be happening > in mainline right? > > Even though mainline has 897ba84dc5aa ("rcu-tasks: Handle idle tasks > for recently offlined CPUs") , the warning should still be happening > due to Liam's "kernel/sched: Modify initial boot task idle setup" > because the warning is just rearranged a bit but essentially the same. > > IMHO, the right thing to do then is to drop Liam's patch from 5.15 and > fix it in mainline (using the ideas described in this thread), then > backport both that new fix and Liam's patch to 5.15. > > Or is there a reason this warning does not show up on the mainline? > > My impression is that dropping Liam's patch for the stable release and > revisiting it later is a better approach since tiny RCU is used way > less in the wild than tree/tasks RCU. Thoughts? I think that this one is strange enough that we need to write down the situation in detail, make sure we have all the corner cases covered in both mainline and -stable, and decide what to do from there. Yes, I know, this email thread contains much of this information, but a little organizing of it would be good. Would you like to put that together, or should I? If me, I will get a draft out by the end of this coming Tuesday, Pacific Time. Thanx, Paul