Hi All,
We are observing NULL pointer dereference issue in rcu_do_batch() in
5.15, although it is very hard to hit.
Wanted to check if it is been reported and fixed in recent kernel ?
<1>[16.814014] [pid: 58] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000000
<0>[16.814027] [pid: 58] PC Code: bad value
<0>[16.814034] [pid: 58] LR Code: f81e03a8 b5000068 d10083a8 f81e83a8
aa1f03f6 91127319 d10083b7 f9434b68 d503201f f9400408 910006d6 f900041f
d63f0100 (91004308) b8bfc108 374001c8 97ffff2b 9111e308 38bfc108 72001d1f
<4>[16.814359] [pid: 58] CPU: 7 PID: 58 Comm: rcuop/5 Tainted: G S
W OE 5.15.41-android13-8-25574579-abS911USQU1AVLL #1
<4>[16.814361] [pid: 58] Hardware name: XXXXX
<4>[16.814362] [pid: 58] pstate: 42400805 (nZcv daif +PAN -UAO +TCO
-DIT -SSBS BTYPE=-c)
<4>[16.814364] [pid: 58] pc : 0x0
<4>[16.814365] [pid: 58] lr : rcu_do_batch+0x328/0xcd8
rcu_data for CPU5 contains additional 12 RCU callback heads in the
segment of RCU_DONE_TAIL whose func is NULL. It doesn’t seem to be a
random memory corruption since only rhp->func is set to null across
multiple objects.
There is one more occurrence with CONFIG_CFI_CLANG enabled.
[123587.101222][ T44] Kernel panic - not syncing: CFI failure (target:
0x0)
[123587.101249][ T44] CPU: 0 PID: 44 Comm: rcuop/3 Tainted: G S
WC OE 5.15.41 #1
[123587.101263][ T44] Hardware name: XXXXX
[123587.101274][ T44] Call trace:
[123587.101283][ T44] dump_backtrace.cfi_jt+0x0/0x8
[123587.101298][ T44] show_stack+0x1c/0x2c
[123587.101311][ T44] dump_stack_lvl+0x94/0x100
[123587.101326][ T44] panic+0x17c/0x450
[123587.101338][ T44] find_check_fn+0x0/0x210
[123587.101349][ T44] rcu_do_batch+0x368/0x6f8
[123587.101362][ T44] nocb_cb_wait+0x80/0x450
[123587.101374][ T44] rcu_nocb_cb_kthread+0x54/0x90
[123587.101386][ T44] kthread+0x174/0x1d8
[123587.101398][ T44] ret_from_fork+0x10/0x20
[123587.101410][ T44] SMP: stopping secondary CPUs
[123587.101670][ C4] VendorHooks: CPU4: stopping
-Mukesh