On Wed, Jun 1, 2022 at 12:06 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > On Wed, 1 Jun 2022 at 12:04, Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> wrote: > > > > On Wed, Jun 1, 2022 at 11:28 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > > > On Wed, 1 Jun 2022 at 10:08, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > > > > > On Wed, 1 Jun 2022 at 09:59, Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > > > > > > > > On Wed, Jun 1, 2022 at 9:36 AM Yegor Yefremov > > > > > <yegorslists@xxxxxxxxxxxxxx> wrote: > > > > > > On Tue, May 31, 2022 at 5:23 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > > > > > I've pushed a modified branch now, with that fix on the broken commit, > > > > > > > and another change to make CONFIG_IRQSTACKS user-selectable rather > > > > > > > than always enabled. That should tell us if the problem is in the SMP > > > > > > > patching or in the irqstacks. > > > > > > > > > > > > > > Can you test the top of this branch with CONFIG_IRQSTACKS disabled, > > > > > > > and (if that still stalls) retest the fixed commit f0191ea5c2e5 ("[PART 1] > > > > > > > ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")? > > > > > > > > > > > > 1. the top of this branch with CONFIG_IRQSTACKS disabled stalls > > > > > > 2. f0191ea5c2e5 with the same config - not > > > > > > > > > > Ok, perfect, that does narrow down the problem quite a bit: The final > > > > > patch has seven changes, all of which can be done individually because > > > > > in each case the simplified version in f0191ea5c2e5 is meant to run > > > > > the exact same instructions as the version after the change, when running > > > > > on a uniprocessor machine such as your am335x. > > > > > > > > > > You have already shown earlier that the get_current() and > > > > > __my_cpu_offset() functions are not to blame here, as reverting > > > > > only those does not change the behavior. > > > > > > > > > > This leaves the is_smp() check in set_current(), and the > > > > > four macros in <asm/assembler.h>. I don't see anything obviously > > > > > wrong with any of those five, but I would bet on the macros > > > > > here. Can you try bisecting into this commit, maybe reverting > > > > > the changes to set_current and get_current first, and then > > > > > narrowing it down to (hopefully) a single macro that causes the > > > > > problem? > > > > > > > > > > > > > set_current() is never called by the primary CPU, which is why the > > > > is_smp() check was removed from there in 57a420435edcb0b94 ("ARM: drop > > > > pointless SMP check on secondary startup path"). > > > > > > > > So that leaves only the four macros in asm/assembler.h, but I don't > > > > see anything obviously wrong with those either. > > > > > > I pushed a patch on top of Arnd's branch at the link below that gets > > > rid of the subsections, and uses normal branches (and code patching) > > > to switch between the thread ID register and the LDR to retrieve the > > > CPU offset and the current pointer. I have no explanation whether or > > > why it could make a difference, but I think it's worth a try. > > > > The link to your repo is missing. > > > > Oops, sorry :-) > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=am335x-stall-test I have tested your branch and it stalls: [ 69.924298] rcu: INFO: rcu_sched self-detected stall on CPU [ 69.930986] rcu: 0-...!: (2600 ticks this GP) idle=6f5/1/0x40000004 softirq=2257/2257 fqs=0 [ 69.940551] (t=2600 jiffies g=3413 q=11) [ 69.945187] rcu: rcu_sched kthread timer wakeup didn't happen for 2599 jiffies! g3413 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 69.957111] rcu: Possible timer handling issue on cpu=0 timer-softirq=1261 [ 69.964668] rcu: rcu_sched kthread starved for 2600 jiffies! g3413 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [ 69.975638] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 69.985170] rcu: RCU grace-period kthread stack dump: [ 69.990708] task:rcu_sched state:I stack: 0 pid: 10 ppid: 2 flags:0x00000000 [ 70.000250] [<c0b683b4>] (__schedule) from [<c0b68cf8>] (schedule+0x54/0xe8) [ 70.008705] [<c0b68cf8>] (schedule) from [<c0b6f4fc>] (schedule_timeout+0xa8/0x210) [ 70.017449] [<c0b6f4fc>] (schedule_timeout) from [<c01d8594>] (rcu_gp_fqs_loop+0x118/0x6b4) [ 70.026875] [<c01d8594>] (rcu_gp_fqs_loop) from [<c01dc4c4>] (rcu_gp_kthread+0x138/0x30c) [ 70.036074] [<c01dc4c4>] (rcu_gp_kthread) from [<c0164dd8>] (kthread+0x13c/0x164) [ 70.044559] [<c0164dd8>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x44) [ 70.052732] rcu: Stack dump where RCU GP kthread last ran: [ 70.058773] NMI backtrace for cpu 0 [ 70.062840] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.16.0-rc1 #1 [ 70.070003] Hardware name: Generic AM33XX (Flattened Device Tree) [ 70.076698] Workqueue: events dbs_work_handler [ 70.082258] [<c01115f0>] (unwind_backtrace) from [<c010bfd4>] (show_stack+0x10/0x14) [ 70.091113] [<c010bfd4>] (show_stack) from [<d00299f0>] (0xd00299f0) [ 70.099045] NMI backtrace for cpu 0 [ 70.103188] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.16.0-rc1 #1 [ 70.110357] Hardware name: Generic AM33XX (Flattened Device Tree) [ 70.117027] Workqueue: events dbs_work_handler [ 70.122491] [<c01115f0>] (unwind_backtrace) from [<c010bfd4>] (show_stack+0x10/0x14) [ 70.131254] [<c010bfd4>] (show_stack) from [<d00299f0>] (0xd00299f0)