On Fri, Jun 26, 2020 at 09:33:45AM +0200, Peter Zijlstra wrote: > On Fri, Jun 26, 2020 at 10:47:22AM +1000, Dave Chinner wrote: > > [ 1102.169209] BUG: kernel NULL pointer dereference, address: 0000000000000150 > > [ 1102.171270] #PF: supervisor read access in kernel mode > > [ 1102.172894] #PF: error_code(0x0000) - not-present page > > [ 1102.174408] PGD 0 P4D 0 > > [ 1102.175136] Oops: 0000 [#1] PREEMPT SMP > > [ 1102.176293] CPU: 2 PID: 909 Comm: kworker/2:1H Not tainted 5.8.0-rc2-dgc+ #2469 > > [ 1102.178395] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 > > [ 1102.180762] Workqueue: xfs-log/pmem0 xlog_ioend_work > > [ 1102.182286] RIP: 0010:check_preempt_wakeup+0xc8/0x1e0 > > [ 1102.183804] Code: 39 c2 75 f2 89 d0 39 d0 7d 20 83 ea 01 4d 8b a4 24 48 01 00 00 39 d0 75 f1 eb 0f 48 8b 9b 48 01 00 00 4d 8b a4 24 48 01 00 00 <48> 8b bb 50 01 00 00 49 39 bc 24 b > > [ 1102.189125] RSP: 0018:ffffc9000071cea0 EFLAGS: 00010006 > > [ 1102.190625] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff822305a0 > > [ 1102.192737] RDX: 0000000000000000 RSI: ffff88853337cd80 RDI: ffff88853ea2a940 > > [ 1102.194827] RBP: ffffc9000071ced8 R08: ffffffff822305a0 R09: ffff88853ec2b2d0 > > [ 1102.196886] R10: ffff88800f74b010 R11: ffff88853ec2a970 R12: 0000000000000000 > > [ 1102.199040] R13: ffff88853ea2a8c0 R14: 0000000000000001 R15: ffff88853e3b0000 > > [ 1102.200883] FS: 0000000000000000(0000) GS:ffff88853ea00000(0000) knlGS:0000000000000000 > > [ 1102.203306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1102.205024] CR2: 0000000000000150 CR3: 00000000ae7b5004 CR4: 0000000000060ee0 > > [ 1102.207117] Call Trace: > > [ 1102.207895] <IRQ> > > [ 1102.208500] ? enqueue_task_fair+0x1d7/0x9f0 > > [ 1102.209709] check_preempt_curr+0x74/0x80 > > [ 1102.210931] ttwu_do_wakeup+0x1e/0x170 > > [ 1102.212064] ttwu_do_activate+0x5b/0x70 > > [ 1102.213225] sched_ttwu_pending+0x94/0xe0 > > [ 1102.214410] flush_smp_call_function_queue+0xf1/0x190 > > [ 1102.215885] generic_smp_call_function_single_interrupt+0x13/0x20 > > [ 1102.217790] __sysvec_call_function_single+0x2b/0xe0 > > [ 1102.219375] asm_call_on_stack+0xf/0x20 > > [ 1102.220599] </IRQ> > > [ 1102.221280] sysvec_call_function_single+0x7e/0x90 > > [ 1102.222854] asm_sysvec_call_function_single+0x12/0x20 > > https://git.kernel.org/tip/964ed98b075263faabe416eeebac99a9bef3f06c > > Should be headed to Linus soon. Testing it now. Observation from the outside: "However I'm having trouble convincing myself that's actually possible on x86_64.... " This scheduler code has fallen off a really high ledge on the memory barrier cliff, hasn't it? Having looked at this code over the past 24 hours and the recent history, I know that understanding it - let alone debugging and fixing problem in it - is way beyond my capabilities. And I say that as an experienced kernel developer with a pretty good grasp of concurrent programming and a record of implementing a fair number of non-trivial lockless algorithms over the years.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx