> From: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > Sent: Monday, February 27, 2023 9:15 PM > To: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > Cc: Sanan Hasanov <sanan.hasanov@xxxxxxxxxxxxxxx>; paulmck@xxxxxxxxxx; > frederic@xxxxxxxxxx; quic_neeraju@xxxxxxxxxxx; josh@xxxxxxxxxxxxxxxx; > rostedt@xxxxxxxxxxx; mathieu.desnoyers@xxxxxxxxxxxx; > jiangshanlai@xxxxxxxxx; rcu@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > syzkaller@xxxxxxxxxxxxxxxx; contact@xxxxxxxxx > Subject: Re: BUG: unable to handle kernel NULL pointer dereference in > rcu_core > > ... > >> BUG: kernel NULL pointer dereference, address: 0000000000000000 > >> #PF: supervisor instruction fetch in kernel mode > >> #PF: error_code(0x0010) - not-present page PGD 53756067 P4D 53756067 > >> PUD 0 > >> Oops: 0010 [#1] PREEMPT SMP KASAN > >> CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.2.0-next-20230221 #1 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > >> 04/01/2014 > >> RIP: 0010:0x0 > >> Code: Unable to access opcode bytes at 0xffffffffffffffd6. > >> RSP: 0018:ffffc900003f8e48 EFLAGS: 00010246 > >> RAX: 0000000000000000 RBX: ffff888100833900 RCX: 00000000b9582f6c > >> RDX: 1ffff11020106853 RSI: ffffffff816b2769 RDI: ffff888043f64708 > >> RBP: 000000000000000c R08: 0000000000000000 R09: ffffffff900b895f > >> R10: fffffbfff201712b R11: 000000000008e001 R12: dffffc0000000000 > >> R13: ffffc900003f8ec8 R14: ffff888043f64708 R15: 000000000000000b > >> FS: 0000000000000000(0000) GS:ffff888119f80000(0000) > >> knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: ffffffffffffffd6 CR3: 0000000054e64000 CR4: 0000000000350ee0 > >> Call Trace: > >> <IRQ> > >> rcu_core+0x85d/0x1960 > >> __do_softirq+0x2e5/0xae2 > >> __irq_exit_rcu+0x11d/0x190 > >> irq_exit_rcu+0x9/0x20 > >> sysvec_apic_timer_interrupt+0x97/0xc0 > >> </IRQ> > >> <TASK> > >> asm_sysvec_apic_timer_interrupt+0x1a/0x20 > >> RIP: 0010:default_idle+0xf/0x20 > >> Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 76 ff > >> ff ff cc cc cc cc f3 0f 1e fa eb 07 0f 00 2d e3 8a 34 00 fb f4 <fa> > >> c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 65 > >> RSP: 0018:ffffc9000017fe00 EFLAGS: 00000202 > >> RAX: 0000000000dfbea1 RBX: dffffc0000000000 RCX: ffffffff89b1da9c > >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 > >> RBP: 0000000000000007 R08: 0000000000000001 R09: ffff888119fb6c23 > >> R10: ffffed10233f6d84 R11: dffffc0000000000 R12: 0000000000000003 > >> R13: ffff888100833900 R14: ffffffff8e112850 R15: 0000000000000000 > >> default_idle_call+0x67/0xa0 > >> do_idle+0x361/0x440 > >> cpu_startup_entry+0x18/0x20 > >> start_secondary+0x256/0x300 > >> secondary_startup_64_no_verify+0xce/0xdb > >> </TASK> > >> Modules linked in: > >> CR2: 0000000000000000 > >> ---[ end trace 0000000000000000 ]--- > >> RIP: 0010:0x0 > >> Code: Unable to access opcode bytes at 0xffffffffffffffd6. > > I have seen this exact signature when the processor tries to execute a > function that has a NULL address. That causes IP to goto 0 and the exception. > Sounds like something corrupted rcu_head (Just a guess). Did a quick test to directly invoke "call_rcu(head, NULL)", then the kernel got panic with almost the same call trace as above and with the same RIP: RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. If invoke " call_rcu(head, NULL + 1)", then RIP: 0010:0x1 Code: Unable to access opcode bytes at 0xffffffffffffffd7. If invoke " call_rcu(head, NULL + 2)", then RIP: 0010:0x2 Code: Unable to access opcode bytes at 0xffffffffffffffd8. The log above tends to say your guess (a corrupted rcu_head) is reasonable. 😊 -Qiuxu