>> syzbot reported the following splat [0]. >> >> In check_atomic_load/store(), register validity is not checked before >> atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg() >> called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG >> or greater. >> >> Add check_reg_arg() before atomic_ptr_type_ok(), and return early when >> the register is invalid. >> >> [0] >> BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline] >> BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 >> Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842 >> >> CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 >> Call Trace: >> <TASK> >> __dump_stack lib/dump_stack.c:94 [inline] >> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 >> print_address_description mm/kasan/report.c:408 [inline] >> print_report+0x16e/0x5b0 mm/kasan/report.c:521 >> kasan_report+0x143/0x180 mm/kasan/report.c:634 >> is_ctx_reg kernel/bpf/verifier.c:6185 [inline] >> atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 >> check_atomic_store kernel/bpf/verifier.c:7804 [inline] >> check_atomic kernel/bpf/verifier.c:7841 [inline] >> do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334 >> do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600 >> do_check_main kernel/bpf/verifier.c:22691 [inline] >> bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821 >> bpf_prog_load+0x1664/0x20e0 kernel/bpf/syscall.c:2967 >> __sys_bpf+0x4ea/0x820 kernel/bpf/syscall.c:5811 >> __do_sys_bpf kernel/bpf/syscall.c:5918 [inline] >> __se_sys_bpf kernel/bpf/syscall.c:5916 [inline] >> __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5916 >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> RIP: 0033:0x7fa3ac86bab9 >> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007ffe50fff5f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 >> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa3ac86bab9 >> RDX: 0000000000000094 RSI: 00004000000009c0 RDI: 0000000000000005 >> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000006 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 >> </TASK> >> >> Allocated by task 5842: >> kasan_save_stack mm/kasan/common.c:47 [inline] >> kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 >> poison_kmalloc_redzone mm/kasan/common.c:377 [inline] >> __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 >> kasan_kmalloc include/linux/kasan.h:260 [inline] >> __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4325 >> kmalloc_noprof include/linux/slab.h:901 [inline] >> kzalloc_noprof include/linux/slab.h:1037 [inline] >> do_check_common+0x1ec/0x2080 kernel/bpf/verifier.c:22499 >> do_check_main kernel/bpf/verifier.c:22691 [inline] >> bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821 >> bpf_prog_load+0x1664/0x20e0 kernel/bpf/syscall.c:2967 >> __sys_bpf+0x4ea/0x820 kernel/bpf/syscall.c:5811 >> __do_sys_bpf kernel/bpf/syscall.c:5918 [inline] >> __se_sys_bpf kernel/bpf/syscall.c:5916 [inline] >> __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5916 >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> >> The buggy address belongs to the object at ffff888141b0d000 >> which belongs to the cache kmalloc-2k of size 2048 >> The buggy address is located 312 bytes to the right of >> allocated 1368-byte region [ffff888141b0d000, ffff888141b0d558) >> >> The buggy address belongs to the physical page: >> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x141b08 >> head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >> flags: 0x57ff00000000040(head|node=1|zone=2|lastcpupid=0x7ff) >> page_type: f5(slab) >> raw: 057ff00000000040 ffff88801b042000 dead000000000100 dead000000000122 >> raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000 >> head: 057ff00000000040 ffff88801b042000 dead000000000100 dead000000000122 >> head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000 >> head: 057ff00000000003 ffffea000506c201 ffffffffffffffff 0000000000000000 >> head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 >> page dumped because: kasan: bad access detected >> page_owner tracks the page as allocated >> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 8909973200, free_ts 0 >> set_page_owner include/linux/page_owner.h:32 [inline] >> post_alloc_hook+0x1f4/0x240 mm/page_alloc.c:1585 >> prep_new_page mm/page_alloc.c:1593 [inline] >> get_page_from_freelist+0x3a8c/0x3c20 mm/page_alloc.c:3538 >> __alloc_frozen_pages_noprof+0x264/0x580 mm/page_alloc.c:4805 >> alloc_pages_mpol+0x311/0x660 mm/mempolicy.c:2270 >> alloc_slab_page mm/slub.c:2423 [inline] >> allocate_slab+0x8f/0x3a0 mm/slub.c:2587 >> new_slab mm/slub.c:2640 [inline] >> ___slab_alloc+0xc27/0x14a0 mm/slub.c:3826 >> __slab_alloc+0x58/0xa0 mm/slub.c:3916 >> __slab_alloc_node mm/slub.c:3991 [inline] >> slab_alloc_node mm/slub.c:4152 [inline] >> __kmalloc_cache_noprof+0x27b/0x390 mm/slub.c:4320 >> kmalloc_noprof include/linux/slab.h:901 [inline] >> kzalloc_noprof include/linux/slab.h:1037 [inline] >> virtio_pci_probe+0x54/0x340 drivers/virtio/virtio_pci_common.c:689 >> local_pci_probe drivers/pci/pci-driver.c:324 [inline] >> pci_call_probe drivers/pci/pci-driver.c:392 [inline] >> __pci_device_probe drivers/pci/pci-driver.c:417 [inline] >> pci_device_probe+0x6c5/0xa10 drivers/pci/pci-driver.c:451 >> really_probe+0x2b9/0xad0 drivers/base/dd.c:658 >> __driver_probe_device+0x1a2/0x390 drivers/base/dd.c:800 >> driver_probe_device+0x50/0x430 drivers/base/dd.c:830 >> __driver_attach+0x45f/0x710 drivers/base/dd.c:1216 >> bus_for_each_dev+0x239/0x2b0 drivers/base/bus.c:370 >> bus_add_driver+0x346/0x670 drivers/base/bus.c:678 >> page_owner free stack trace missing >> >> Memory state around the buggy address: >> ffff888141b0d580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888141b0d600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> >ffff888141b0d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ^ >> ffff888141b0d700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> ffff888141b0d780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >> >> Reported-by: syzbot+a5964227adc0f904549c@xxxxxxxxxxxxxxxxxxxxxxxxx >> Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c >> Tested-by: syzbot+a5964227adc0f904549c@xxxxxxxxxxxxxxxxxxxxxxxxx >> Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions") >> Signed-off-by: Kohei Enju <enjuk@xxxxxxxxxx> >> --- > >I wonder if we have test cases for malformed instructions. >Maybe add one since this issue was hit? Thank you for the suggestion. I agree that having a test for this specific issue would be valuable. I'll work on it. > >> kernel/bpf/verifier.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 3303a3605ee8..6481604ab612 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -7788,6 +7788,12 @@ static int check_atomic_rmw(struct bpf_verifier_env *env, >> static int check_atomic_load(struct bpf_verifier_env *env, >> struct bpf_insn *insn) >> { >> + int err; >> + >> + err = check_reg_arg(env, insn->src_reg, SRC_OP); >> + if (err) >> + return err; >> + > >I agree with these changes, however, both check_load_mem() and >check_store_reg() already do check_reg_arg() first thing. >Maybe just swap the atomic_ptr_type_ok() and check_load_mem()? You're absolutely right. The additional check_reg_arg() introduces some redundancy since check_load_mem() and check_store_reg() do that. I've revised the patch to simply swap the order and syzbot didn't trigger the issue in this context. https://lore.kernel.org/all/20250315055941.10487-2-enjuk@xxxxxxxxxx/ However, the change in order affects results of existing test cases because messages are changed by this swapping.[1] In this case, we need to either modify these existing test cases or reconsider the logic of check_atomic_load/store(). I'm not sure which approach would be preferable in this situation. [1] https://patchwork.kernel.org/project/netdevbpf/patch/20250315055941.10487-2-enjuk@xxxxxxxxxx/ First test_progs failure (test_progs-x86_64-llvm-18): #504 verifier_load_acquire tester_init:PASS:tester_log_buf 0 nsec process_subtest:PASS:obj_open_mem 0 nsec process_subtest:PASS:specs_alloc 0 nsec #504/17 verifier_load_acquire/load-acquire from pkt pointer run_subtest:PASS:obj_open_mem 0 nsec libbpf: prog 'load_acquire_from_pkt_pointer': BPF program load failed: -EACCES libbpf: prog 'load_acquire_from_pkt_pointer': failed to load: -EACCES libbpf: failed to load object 'verifier_load_acquire' run_subtest:PASS:unexpected_load_success 0 nsec validate_msgs:FAIL:754 expect_msg VERIFIER LOG: ============= Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported. 0: R1=ctx() R10=fp0 ; asm volatile ( @ verifier_load_acquire.c:140 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx() R2_w=pkt(r=0) 1: (d3) r0 = load_acquire((u8 *)(r2 +0)) invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0) R2 offset is outside of the packet processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 ============= EXPECTED SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed' ... (snip) #541/25 verifier_store_release/store-release to sock pointer run_subtest:PASS:obj_open_mem 0 nsec libbpf: prog 'store_release_to_sock_pointer': BPF program load failed: -EACCES libbpf: prog 'store_release_to_sock_pointer': failed to load: -EACCES libbpf: failed to load object 'verifier_store_release' run_subtest:PASS:unexpected_load_success 0 nsec validate_msgs:FAIL:754 expect_msg VERIFIER LOG: ============= Global function store_release_to_sock_pointer() doesn't return scalar. Only those are supported. 0: R1=ctx() R10=fp0 ; asm volatile ( @ verifier_store_release.c:191 0: (b4) w0 = 0 ; R0_w=0 1: (79) r2 = *(u64 *)(r1 +40) ; R1=ctx() R2_w=sock() 2: (d3) store_release((u8 *)(r2 +0), r0) R2 cannot write into sock processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 ============= EXPECTED SUBSTR: 'BPF_ATOMIC stores into R2 sock is not allowed' > >> if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) { >> verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n", >> insn->src_reg, >> @@ -7801,6 +7807,12 @@ static int check_atomic_load(struct bpf_verifier_env *env, >> static int check_atomic_store(struct bpf_verifier_env *env, >> struct bpf_insn *insn) >> { >> + int err; >> + >> + err = check_reg_arg(env, insn->dst_reg, SRC_OP); >> + if (err) >> + return err; >> + >> if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) { >> verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n", >> insn->dst_reg, Regards, Kohei