On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote: > This is amazing debugging Boqun, like a boss! One comment below: > > > > > Or something simple I haven't thought of? :) > > > > > > At what points can r13 change? Only when some particular functions are > > > called? > > > > > > > r13 is the local paca: > > > > register struct paca_struct *local_paca asm("r13"); > > > > , which is a pointer to percpu data. > > > > So if a task schedule from one CPU to anotehr CPU, the value gets > > changed. > > It appears the whole issue, per your analysis, is that the stack > checking code in gcc should not cache or alias r13, and must read its > most up-to-date value during stack checking, as its value may have > changed during a migration to a new CPU. > > Did I get that right? > > IMO, even without a reproducer, gcc on PPC should just not do that, > that feels terribly broken for the kernel. I wonder what clang does, > I'll go poke around with compilerexplorer after lunch. > > Adding +Peter Zijlstra as well to join the party as I have a feeling > he'll be interested. ;-) I'm a little confused; the way I understand the whole stack protector thing to work is that we push a canary on the stack at call and on return check it is still valid. Since in general tasks randomly migrate, the per-cpu validation canary should be the same on all CPUs. Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu, but no guarantees. Both cases use r13 (paca) in a racy manner, and in both cases it should be safe.