On Mon, 2019-05-13 at 10:01 -0700, Rick Edgecombe wrote: > On Mon, 2019-05-13 at 17:01 +0300, Meelis Roos wrote: > > I tested yesterdays 5.2 devel git and it failed to boot on my Sun Fire V445 > > (4x UltraSparc III). Init is started and it hangs there: > > > > [ 38.414436] Run /sbin/init as init process > > [ 38.530711] random: fast init done > > [ 39.580678] systemd[1]: Inserted module 'autofs4' > > [ 39.721577] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT > > +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT > > +GNUTLS > > +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default- > > hierarchy=hybrid) > > [ 40.028068] systemd[1]: Detected architecture sparc64. > > > > Welcome to Debian GNU/Linux 10 (buster)! > > > > [ 40.168713] systemd[1]: Set hostname to <v445>. > > [ 61.318034] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > [ 61.403039] rcu: 1-...!: (0 ticks this GP) > > idle=602/1/0x4000000000000000 softirq=85/85 fqs=1 > > [ 61.526780] rcu: (detected by 3, t=5252 jiffies, g=-967, q=228) > > [ 61.613037] CPU[ 1]: TSTATE[0000000080001602] TPC[000000000043f2b8] > > TNPC[000000000043f2bc] TASK[systemd-fstab-g:90] > > [ 61.766828] TPC[smp_synchronize_tick_client+0x18/0x180] > > O7[__do_munmap+0x204/0x3e0] I7[xcall_sync_tick+0x1c/0x2c] > > RPC[page_evictable+0x4/0x60] > > [ 61.966807] rcu: rcu_sched kthread starved for 5250 jiffies! g-967 f0x0 > > RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 > > [ 62.113058] rcu: RCU grace-period kthread stack dump: > > [ 62.185558] rcu_sched I 0 10 2 0x06000000 > > [ 62.264312] Call Trace: > > [ 62.299316] [000000000092a1fc] schedule+0x1c/0x80 > > [ 62.368071] [000000000092d3fc] schedule_timeout+0x13c/0x280 > > [ 62.449328] [00000000004b6c64] rcu_gp_kthread+0x4c4/0xa40 > > [ 62.528077] [000000000047e95c] kthread+0xfc/0x120 > > [ 62.596833] [00000000004060a4] ret_from_fork+0x1c/0x2c > > [ 62.671831] [0000000000000000] (null) > > > > 5.1.0 worked fine. I bisected it to the following commit: > > > > d53d2f78ceadba081fc7785570798c3c8d50a718 is the first bad commit > > commit d53d2f78ceadba081fc7785570798c3c8d50a718 > > Author: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > Date: Thu Apr 25 17:11:38 2019 -0700 > > > > bpf: Use vmalloc special flag > > > > Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special > > permissioned memory in vmalloc and remove places where memory was set > > RW > > before freeing which is no longer needed. Don't track if the memory is > > RO > > anymore because it is now tracked in vmalloc. > > > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > > Cc: <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: <ard.biesheuvel@xxxxxxxxxx> > > Cc: <deneen.t.dock@xxxxxxxxx> > > Cc: <kernel-hardening@xxxxxxxxxxxxxxxxxx> > > Cc: <kristen@xxxxxxxxxxxxxxx> > > Cc: <linux_dti@xxxxxxxxxx> > > Cc: <will.deacon@xxxxxxx> > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > > Cc: Borislav Petkov <bp@xxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > Cc: Nadav Amit <nadav.amit@xxxxxxxxx> > > Cc: Rik van Riel <riel@xxxxxxxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@xxxxxxxxxx > > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> > > > > :040000 040000 58066de53107eab0705398b5d0c407424c138a86 > > 7a1345d43c4cacee60b9135899b775ecdb54ea7e M include > > :040000 040000 d02692cf57a359056b34e636d0f102d37de5b264 > > 81c4c2c6408b68eb555673bd3f0bc3071db1f7ed M kernel > > > > Thanks, I'll see if I can reproduce. > > Rick I'm having trouble getting Debian Buster up and running on qemu-system- sparc64 and so haven't been able to reproduce. Is this currently working for people? This patch involves re-setting memory permissions when freeing executable memory. It looks like Sparc64 Linux doesn't have support for the set_memory_() functions so that part shouldn't be changing anything. The main other thing that is changed here is always doing a TLB flush in vfree when the BPF JITs are freed. It will already sometimes happen so that shouldn't be too different either. So it doesn't seem extra especially likely to cause a sparc specific problem that I can see. Is there any chance this is an intermittent issue? Alternatively, we could maybe just exempt architectures with no set_memory_() implementations from this new behavior. That would unfortunately lose the benefits for architectures with no set_memory_()'s but that have executable permission bits. But then this patch would have no effect on sparc64 and would possibly resolve it without really debugging it. Thanks, Rick