Re: BUG: unable to handle kernel NULL pointer dereference in rcu_core

Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> · Tue, 28 Feb 2023 00:11:51 +0800

On Mon, Feb 27, 2023 at 11:49 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> Hey Steve,
>
> On Mon, Feb 27, 2023 at 10:33 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > On Mon, 27 Feb 2023 08:15:26 -0500
> > Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > > >> asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > > >> RIP: 0010:default_idle+0xf/0x20
> > > >> Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 76 ff ff ff cc cc cc cc f3 0f 1e fa eb 07 0f 00 2d e3 8a 34 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 65
> > > >> RSP: 0018:ffffc9000017fe00 EFLAGS: 00000202
> > > >> RAX: 0000000000dfbea1 RBX: dffffc0000000000 RCX: ffffffff89b1da9c
> > > >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> > > >> RBP: 0000000000000007 R08: 0000000000000001 R09: ffff888119fb6c23
> > > >> R10: ffffed10233f6d84 R11: dffffc0000000000 R12: 0000000000000003
> > > >> R13: ffff888100833900 R14: ffffffff8e112850 R15: 0000000000000000
> > > >> default_idle_call+0x67/0xa0
> > > >> do_idle+0x361/0x440
> > > >> cpu_startup_entry+0x18/0x20
> > > >> start_secondary+0x256/0x300
> > > >> secondary_startup_64_no_verify+0xce/0xdb
> > > >> </TASK>
> > > >> Modules linked in:
> > > >> CR2: 0000000000000000
> > > >> ---[ end trace 0000000000000000 ]---
> > > >> RIP: 0010:0x0
> > > >> Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> > >
> > > I have seen this exact signature when the processor tries to execute a function that has a NULL address. That causes IP to goto 0 and the exception. Sounds like something corrupted rcu_head (Just a guess).
> >
> > [ Joel, you need to line wrap your emails ;-) ]
>
> Ok I will try. The thing is, I have not figured out yet how to
> plaintext-reply from my iPhone without having it wrap :-(
>
> > This looks like a call_rcu() was called on something that later got freed
> > or reused. That is, the bug is not with RCU but with something using RCU.
>
> Yes certainly, the rcu_head is allocated on the caller side so it
> could have been trampled while the callback was still in flight.
Thank you all for your guidance, I learned a lot during this process
>
> > OR it could be a bug with RCU if the synchronize_rcu() ended before the
> > grace periods have finished.
Thanks again.

By the way, the syzkaller on my local machine has been running for 8
hours, only three bugs reported[1][2][3], but they don't seem to be
related to Sanan's original report.
Maybe there are some configuration mismatches between us.The test
continues,  I will report to you once I have any new discovery.

[1] http://154.220.3.120:56700/
[2] https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/next/linux-next.git/snapshot/linux-next-next-20230221.tar.gz
[3] http://154.220.3.120/configs/linux-next-config-20230221.txt
Thanks
Zhouyi
>
> Good point..
>
> Thanks,
>
>  - Joel