On Sun, May 26, 2024 at 04:06:48PM -0700, Cong Wang wrote: > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > After commit 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") > CONFIG_BPF_JIT does not depend on CONFIG_MODULES any more and bpf jit > also uses the MODULES_VADDR ~ MODULES_END memory region. But > is_vmalloc_or_module_addr() still checks CONFIG_MODULES, which then > returns false for a bpf jit memory region when CONFIG_MODULES is not > defined. It leads to the following kernel BUG: > > [ 1.567023] ------------[ cut here ]------------ > [ 1.567883] kernel BUG at mm/vmalloc.c:745! > [ 1.568477] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > [ 1.569367] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0+ #448 > [ 1.570247] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 > [ 1.570786] RIP: 0010:vmalloc_to_page+0x48/0x1ec > [ 1.570786] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 > [ 1.570786] RSP: 0018:ffff888007787960 EFLAGS: 00010212 > [ 1.570786] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c > [ 1.570786] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 > [ 1.570786] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 > [ 1.570786] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 > [ 1.570786] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 > [ 1.570786] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 > [ 1.570786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.570786] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 > [ 1.570786] Call Trace: > [ 1.570786] <TASK> > [ 1.570786] ? __die_body+0x1b/0x58 > [ 1.570786] ? die+0x31/0x4b > [ 1.570786] ? do_trap+0x9d/0x138 > [ 1.570786] ? vmalloc_to_page+0x48/0x1ec > [ 1.570786] ? do_error_trap+0xcd/0x102 > [ 1.570786] ? vmalloc_to_page+0x48/0x1ec > [ 1.570786] ? vmalloc_to_page+0x48/0x1ec > [ 1.570786] ? handle_invalid_op+0x2f/0x38 > [ 1.570786] ? vmalloc_to_page+0x48/0x1ec > [ 1.570786] ? exc_invalid_op+0x2b/0x41 > [ 1.570786] ? asm_exc_invalid_op+0x16/0x20 > [ 1.570786] ? vmalloc_to_page+0x26/0x1ec > [ 1.570786] ? vmalloc_to_page+0x48/0x1ec > [ 1.570786] __text_poke+0xb6/0x458 > [ 1.570786] ? __pfx_text_poke_memcpy+0x10/0x10 > [ 1.570786] ? __pfx___mutex_lock+0x10/0x10 > [ 1.570786] ? __pfx___text_poke+0x10/0x10 > [ 1.570786] ? __pfx_get_random_u32+0x10/0x10 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] text_poke_copy_locked+0x70/0x84 > [ 1.570786] text_poke_copy+0x32/0x4f > [ 1.570786] bpf_arch_text_copy+0xf/0x27 > [ 1.570786] bpf_jit_binary_pack_finalize+0x26/0x5a > [ 1.570786] bpf_int_jit_compile+0x576/0x8ad > [ 1.570786] ? __pfx_bpf_int_jit_compile+0x10/0x10 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] ? __kmalloc_node_track_caller+0x2b5/0x2e0 > [ 1.570786] bpf_prog_select_runtime+0x7c/0x199 > [ 1.570786] bpf_prepare_filter+0x1e9/0x25b > [ 1.570786] ? __pfx_bpf_prepare_filter+0x10/0x10 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] ? _find_next_bit+0x29/0x7e > [ 1.570786] bpf_prog_create+0xb8/0xe0 > [ 1.570786] ptp_classifier_init+0x75/0xa1 > [ 1.570786] ? __pfx_ptp_classifier_init+0x10/0x10 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] ? register_pernet_subsys+0x36/0x42 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] sock_init+0x99/0xa3 > [ 1.570786] ? __pfx_sock_init+0x10/0x10 > [ 1.570786] do_one_initcall+0x104/0x2c4 > [ 1.570786] ? __pfx_do_one_initcall+0x10/0x10 > [ 1.570786] ? parameq+0x25/0x2d > [ 1.570786] ? rcu_is_watching+0x1c/0x3c > [ 1.570786] ? trace_kmalloc+0x81/0xb2 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] ? __kmalloc+0x29c/0x2c7 > [ 1.570786] ? srso_return_thunk+0x5/0x5f > [ 1.570786] do_initcalls+0xf9/0x123 > [ 1.570786] kernel_init_freeable+0x24f/0x289 > [ 1.570786] ? __pfx_kernel_init+0x10/0x10 > [ 1.570786] kernel_init+0x19/0x13a > [ 1.570786] ret_from_fork+0x24/0x41 > [ 1.570786] ? __pfx_kernel_init+0x10/0x10 > [ 1.570786] ret_from_fork_asm+0x1a/0x30 > [ 1.570786] </TASK> > [ 1.570819] ---[ end trace 0000000000000000 ]--- > [ 1.571463] RIP: 0010:vmalloc_to_page+0x48/0x1ec > [ 1.572111] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 > [ 1.574632] RSP: 0018:ffff888007787960 EFLAGS: 00010212 > [ 1.575129] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c > [ 1.576097] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 > [ 1.577084] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 > [ 1.578077] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 > [ 1.578810] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 > [ 1.579823] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 > [ 1.580992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.581869] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 > [ 1.582800] Kernel panic - not syncing: Fatal exception > [ 1.583765] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") > Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx> > Cc: Mike Rapoport (IBM) <rppt@xxxxxxxxxx> > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > --- > mm/vmalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 125427cbdb87..168a5c7c2fdf 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -714,7 +714,7 @@ int is_vmalloc_or_module_addr(const void *x) > * and fall back on vmalloc() if that fails. Others > * just put it in the vmalloc space. > */ > -#if defined(CONFIG_MODULES) && defined(MODULES_VADDR) > +#if defined(MODULES_VADDR) Let's make it #if defined(CONFIG_EXECMEM) && defined(MODULES_VADDR) to avoid increasing kernel size on systems that don't use modules and BPF > unsigned long addr = (unsigned long)kasan_reset_tag(x); > if (addr >= MODULES_VADDR && addr < MODULES_END) > return 1; > -- > 2.34.1 > -- Sincerely yours, Mike.