Hi Nick, On Fri, Aug 09, 2019 at 02:21:21PM -0700, Nick Desaulniers wrote: > Did anyone report any issue with last night's -next for arm64? > > Some kind of deadlock in online_fair_sched_group. > > [ 15.256790] ================================ > [ 15.257025] WARNING: inconsistent lock state > [ 15.257243] 5.3.0-rc3-next-20190809 #1 Not tainted > [ 15.257393] -------------------------------- > [ 15.257526] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > [ 15.258096] init/1 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 15.258522] (____ptrval____) (&rq->lock){?.-.}, at: > online_fair_sched_group+0x78/0xe4 > [ 15.259170] {IN-HARDIRQ-W} state was registered at: > [ 15.259658] lock_acquire+0x1dc/0x228 > [ 15.259940] _raw_spin_lock+0x40/0x54 > [ 15.260251] scheduler_tick+0x50/0xfc > [ 15.260491] update_process_times+0x80/0x98 > [ 15.260677] tick_periodic+0xd8/0xf0 > [ 15.260910] tick_handle_periodic+0x30/0x94 > [ 15.261126] arch_timer_handler_virt+0x34/0x40 > [ 15.261332] handle_percpu_devid_irq+0x1a8/0x3c4 > [ 15.261495] __handle_domain_irq+0x7c/0xbc > [ 15.261689] gic_handle_irq+0x48/0xac > [ 15.261881] el1_irq+0xbc/0x180 Ok, so we take rq_lock() off the back of a timer interrupt in irq context... > [ 15.267928] dump_backtrace+0x0/0x140 > [ 15.268159] show_stack+0x14/0x1c > [ 15.268341] dump_stack+0xa8/0x104 > [ 15.268482] mark_lock+0xda0/0xda8 > [ 15.268728] __lock_acquire+0x300/0x858 > [ 15.268869] lock_acquire+0x1dc/0x228 > [ 15.269057] _raw_spin_lock+0x40/0x54 ... but also with irqs enabled when handling a syscall. Boom. > [ 15.269201] online_fair_sched_group+0x78/0xe4 > [ 15.269392] sched_online_group+0x88/0xac > [ 15.269591] sched_autogroup_create_attach+0xcc/0x12c > [ 15.269765] ksys_setsid+0xe8/0xec > [ 15.269990] __arm64_sys_setsid+0xc/0x18 > [ 15.270178] el0_svc_common+0x9c/0x15c > [ 15.270317] el0_svc_handler+0x5c/0x64 > [ 15.270493] el0_svc+0x8/0xc > > https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/223856448 > > Guessing related to > commit 6b8fd01b21f5 ("sched/fair: Use rq_lock/unlock in > online_fair_sched_group") Agreed. I think that patch should be using rq_lock_{irqsave,irqrestore}(). Looking at the list archive, it seems that this was already spotted last week: https://lkml.kernel.org/r/dfc8f652-ca98-e30a-546f-e6a2df36e33a@xxxxxxx Although the proposal there disables irqs unconditionally, which matches the old behaviour (prior to 6b8fd01b21f5) but feels a bit dodgy given that the only caller (sched_online_group()) uses the save/restore variants. Phil -- is there a fix queued for this somewhere? Will