NULL pointer issue in rcu_do_batch()

Mukesh Ojha <quic_mojha@xxxxxxxxxxx> · Thu, 22 Dec 2022 17:04:22 +0530

Hi All,

We are observing NULL pointer dereference issue in rcu_do_batch() in 
5.15, although it is very hard to hit.

Wanted to check if it is been reported and fixed in recent kernel ?

<1>[16.814014] [pid:    58] Unable to handle kernel NULL pointer 
dereference at virtual address 0000000000000000
<0>[16.814027] [pid:    58] PC Code: bad value
<0>[16.814034] [pid:    58] LR Code: f81e03a8 b5000068 d10083a8 f81e83a8 
aa1f03f6 91127319 d10083b7 f9434b68 d503201f f9400408 910006d6 f900041f 
d63f0100 (91004308) b8bfc108 374001c8 97ffff2b 9111e308 38bfc108 72001d1f

<4>[16.814359] [pid:    58] CPU: 7 PID: 58 Comm: rcuop/5 Tainted: G S 
  W  OE     5.15.41-android13-8-25574579-abS911USQU1AVLL #1
<4>[16.814361] [pid:    58] Hardware name: XXXXX
<4>[16.814362] [pid:    58] pstate: 42400805 (nZcv daif +PAN -UAO +TCO 
-DIT -SSBS BTYPE=-c)
<4>[16.814364] [pid:    58] pc : 0x0
<4>[16.814365] [pid:    58] lr : rcu_do_batch+0x328/0xcd8

rcu_data for CPU5 contains additional 12 RCU callback heads in the 
segment of RCU_DONE_TAIL whose func is NULL. It doesn’t seem to be a 
random memory corruption since only rhp->func is set to null across 
multiple objects.

There is one more occurrence with CONFIG_CFI_CLANG enabled.

[123587.101222][   T44] Kernel panic - not syncing: CFI failure (target: 
0x0)
[123587.101249][   T44] CPU: 0 PID: 44 Comm: rcuop/3 Tainted: G S 
WC OE     5.15.41 #1
[123587.101263][   T44] Hardware name: XXXXX
[123587.101274][   T44] Call trace:
[123587.101283][   T44]  dump_backtrace.cfi_jt+0x0/0x8
[123587.101298][   T44]  show_stack+0x1c/0x2c
[123587.101311][   T44]  dump_stack_lvl+0x94/0x100
[123587.101326][   T44]  panic+0x17c/0x450
[123587.101338][   T44]  find_check_fn+0x0/0x210
[123587.101349][   T44]  rcu_do_batch+0x368/0x6f8
[123587.101362][   T44]  nocb_cb_wait+0x80/0x450
[123587.101374][   T44]  rcu_nocb_cb_kthread+0x54/0x90
[123587.101386][   T44]  kthread+0x174/0x1d8
[123587.101398][   T44]  ret_from_fork+0x10/0x20
[123587.101410][   T44] SMP: stopping secondary CPUs
[123587.101670][    C4] VendorHooks: CPU4: stopping

-Mukesh